On 2021/8/11 17:16, Richard Biener wrote:
On Wed, 11 Aug 2021, Xionghu Luo wrote:



On 2021/8/10 22:47, Richard Biener wrote:
On Mon, 9 Aug 2021, Xionghu Luo wrote:

Thanks,

On 2021/8/6 19:46, Richard Biener wrote:
On Tue, 3 Aug 2021, Xionghu Luo wrote:

loop split condition is moved between loop1 and loop2, the split bb's
count and probability should also be duplicated instead of (100% vs INV),
secondly, the original loop1 and loop2 count need be propotional from the
original loop.


diff base/loop-cond-split-1.c.151t.lsplit  
patched/loop-cond-split-1.c.151t.lsplit:
...
      int prephitmp_16;
      int prephitmp_25;

      <bb 2> [local count: 118111600]:
      if (n_7(D) > 0)
        goto <bb 4>; [89.00%]
      else
        goto <bb 3>; [11.00%]

      <bb 3> [local count: 118111600]:
      return;

      <bb 4> [local count: 105119324]:
      pretmp_3 = ga;

-  <bb 5> [local count: 955630225]:
+  <bb 5> [local count: 315357973]:
      # i_13 = PHI <i_10(20), 0(4)>
      # prephitmp_12 = PHI <prephitmp_5(20), pretmp_3(4)>
      if (prephitmp_12 != 0)
        goto <bb 6>; [33.00%]
      else
        goto <bb 7>; [67.00%]

-  <bb 6> [local count: 315357972]:
+  <bb 6> [local count: 104068130]:
      _2 = do_something ();
      ga = _2;

-  <bb 7> [local count: 955630225]:
+  <bb 7> [local count: 315357973]:
      # prephitmp_5 = PHI <prephitmp_12(5), _2(6)>
      i_10 = inc (i_13);
      if (n_7(D) > i_10)
        goto <bb 21>; [89.00%]
      else
        goto <bb 11>; [11.00%]

      <bb 11> [local count: 105119324]:
      goto <bb 3>; [100.00%]

-  <bb 21> [local count: 850510901]:
+  <bb 21> [local count: 280668596]:
      if (prephitmp_12 != 0)
-    goto <bb 20>; [100.00%]
+    goto <bb 20>; [33.00%]
      else
-    goto <bb 19>; [INV]
+    goto <bb 19>; [67.00%]

-  <bb 20> [local count: 850510901]:
+  <bb 20> [local count: 280668596]:
      goto <bb 5>; [100.00%]

-  <bb 19> [count: 0]:
+  <bb 19> [local count: 70429947]:
      # i_23 = PHI <i_10(21)>
      # prephitmp_25 = PHI <prephitmp_5(21)>

-  <bb 12> [local count: 955630225]:
+  <bb 12> [local count: 640272252]:
      # i_15 = PHI <i_23(19), i_22(16)>
      # prephitmp_16 = PHI <prephitmp_25(19), prephitmp_16(16)>
      i_22 = inc (i_15);
      if (n_7(D) > i_22)
        goto <bb 16>; [89.00%]
      else
        goto <bb 11>; [11.00%]

-  <bb 16> [local count: 850510901]:
+  <bb 16> [local count: 569842305]:
      goto <bb 12>; [100.00%]

    }

gcc/ChangeLog:

        * tree-ssa-loop-split.c (split_loop): Fix incorrect probability.
        (do_split_loop_on_cond): Likewise.
---
    gcc/tree-ssa-loop-split.c | 16 ++++++++--------
    1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
index 3a09bbc39e5..8e5a7ded0f7 100644
--- a/gcc/tree-ssa-loop-split.c
+++ b/gcc/tree-ssa-loop-split.c
@@ -583,10 +583,10 @@ split_loop (class loop *loop1)
        basic_block cond_bb;

        if (!initial_true)
-         cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
+         cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
+
+       edge true_edge = EDGE_SUCC (bbs[i], 0)->flags & EDGE_TRUE_VALUE
+                          ? EDGE_SUCC (bbs[i], 0)
+                          : EDGE_SUCC (bbs[i], 1);

class loop *loop2 = loop_version (loop1, cond, &cond_bb,
-                                          profile_probability::always (),
-                                          profile_probability::always (),
-                                          profile_probability::always (),
-                                          profile_probability::always (),
+                                          true_edge->probability,
+                                          true_edge->probability.invert (),
+                                          true_edge->probability,
+                                          true_edge->probability.invert (),
                                           true);

there is no 'true_edge' variable at this point.

Sorry, missed the above hunk when split the patch.


        gcc_assert (loop2);
@@ -1486,10 +1486,10 @@ do_split_loop_on_cond (struct loop *loop1, edge invar_branch)
      initialize_original_copy_tables ();
struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
-                                    profile_probability::always (),
-                                    profile_probability::never (),
-                                    profile_probability::always (),
-                                    profile_probability::always (),
+                                    invar_branch->probability.invert (),
+                                    invar_branch->probability,
+                                    invar_branch->probability.invert (),
+                                    invar_branch->probability,
                                     true);
      if (!loop2)
        {

The patch introduction seems to talk about do_split_loop_on_cond only.

split_loop faces similar issue though it sets the two branches to 100% vs 100%
and no scaling which seems also incorrect.

Since loop versioning inserts a condition with the passed probabilities
but in this case a 'boolean_true_node' condition the then and else
probabilities passed look correct.  It's just the scaling arguments
that look wrong?  This loop_version call should get a comment as to
why we are passing probabilities the way we do.

This optimization is transforming from:

for (i = 0; i < n; i = inc (i))
      {
        if (ga)
          ga = do_something ();
}

to:

    for (i = 0; i < x; i = inc (i))
{
      if (true)
           ga = do_something ();
          if (!ga)
            break;
}
    for (; i < n; i = inc (i))
{
      if (false)
           ga = do_something ();
}


`boolean_true_node` is passed in to make the first loop's condition
statement to be 'true', after returning from loop_version, there is a
piece of code forcing the condition in second loop to be false,
and the original condition is moved from *in loop* to *exit edge*
between loop1 and loop2.

Yes, one complication is that we use loop_version but do not retain
the CFG it creates.  Something like the vectorizers
slpeel_tree_duplicate_loop_to_edge_cfg would be a better "fit"
but then that code doesn't do any scaling yet.  But then
loop_version uses cfg_hook_duplicate_loop_to_header_edge and I suppose
we could write a variant that simply doesn't mangle the CFG
with a new condition switching between both loops but keeps them
executed after each other ...

As said, this adds to the confusion and some awkwardness.

Then loop_version in loop split requires two types of variant, one
is to insert condition to loop preheader for 'split_loops' usage,
another is to insert condition to loop exit for 'do_split_loop_on_cond'
usage, it needs one extra function to encapsulate these cfg alterations
from outside to inside.

unswitching only execute one loop as it only moves the invariant condition
to first loop's pre-header.  While 'split_loops' and 'do_split_loop_on_cond'
may execute both loops no matter the condition is moved to the first loop's
preheader or exit.

The condition stmt in loop unswitching is invariant, but it is variant
in loop splitting, that's why loop unswitching execute only one loop
but loop splitting executes both loops.

Seems we need two condition arguments for loop_version, one for connecting
loop1 preheader to loop2 preheader, another one for connecting loop1's exit
to loop2's header?  Then it will be more generic for both unswitching pass
and splitting pass.  Looks a bit complicated and conflicted with loop_version's
comments:

/* Main entry point for Loop Versioning transformation.

    This transformation given a condition and a loop, creates
    -if (condition) { loop_copy1 } else { loop_copy2 }, ... */


And this only works for loop split usage, those many other places
doesn't use loop_version like this?

Yes, as said if you don't want the above CFG then you probably
shouldn't use loop_version but instead use its building blocks
(and some refactoring of loop_version can make that easier).

I think splitting wants

   loop_copy1
   if (condition)
     loop_copy2

IMHO it would be good to split 'loopify' into the actual loopification
and the scaling.  Basically make the part of loop_version that
copies the loop on the header edge and creates a loop structure for
it separate.

loop distribution uses slpeel_tree_duplicate_loop_to_edge_cfg
(copy_loop_before).


Unfortunately slpeel_tree_duplicate_loop_to_edge_cfg only supports
copying loops with single exit, it would cause many ICEs in it even
building GCC stage1 (line 1065, line 1184 due to exit or new_exit
is NULL returning from single_exit (loop).).  Seems loop_version is
more flexible for loop split.


https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/tree-vect-loop-manip.c;h=4988c93fdb61507a26430651b416ae61b217793a;hb=HEAD


--
Thanks,
Xionghu

Reply via email to