On 2021/10/15 13:51, Xionghu Luo via Gcc-patches wrote:
> 
> 
> On 2021/9/23 20:17, Richard Biener wrote:
>> On Wed, 22 Sep 2021, Xionghu Luo wrote:
>>
>>>
>>>
>>> On 2021/8/11 17:16, Richard Biener wrote:
>>>> On Wed, 11 Aug 2021, Xionghu Luo wrote:
>>>>
>>>>>
>>>>>
>>>>> On 2021/8/10 22:47, Richard Biener wrote:
>>>>>> On Mon, 9 Aug 2021, Xionghu Luo wrote:
>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> On 2021/8/6 19:46, Richard Biener wrote:
>>>>>>>> On Tue, 3 Aug 2021, Xionghu Luo wrote:
>>>>>>>>
>>>>>>>>> loop split condition is moved between loop1 and loop2, the split bb's
>>>>>>>>> count and probability should also be duplicated instead of (100% vs
>>>>>>>>> INV),
>>>>>>>>> secondly, the original loop1 and loop2 count need be propotional from
>>>>>>>>> the
>>>>>>>>> original loop.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> diff base/loop-cond-split-1.c.151t.lsplit
>>>>>>>>> patched/loop-cond-split-1.c.151t.lsplit:
>>>>>>>>> ...
>>>>>>>>>       int prephitmp_16;
>>>>>>>>>       int prephitmp_25;
>>>>>>>>>
>>>>>>>>>       <bb 2> [local count: 118111600]:
>>>>>>>>>       if (n_7(D) > 0)
>>>>>>>>>         goto <bb 4>; [89.00%]
>>>>>>>>>       else
>>>>>>>>>         goto <bb 3>; [11.00%]
>>>>>>>>>
>>>>>>>>>       <bb 3> [local count: 118111600]:
>>>>>>>>>       return;
>>>>>>>>>
>>>>>>>>>       <bb 4> [local count: 105119324]:
>>>>>>>>>       pretmp_3 = ga;
>>>>>>>>>
>>>>>>>>> -  <bb 5> [local count: 955630225]:
>>>>>>>>> +  <bb 5> [local count: 315357973]:
>>>>>>>>>       # i_13 = PHI <i_10(20), 0(4)>
>>>>>>>>>       # prephitmp_12 = PHI <prephitmp_5(20), pretmp_3(4)>
>>>>>>>>>       if (prephitmp_12 != 0)
>>>>>>>>>         goto <bb 6>; [33.00%]
>>>>>>>>>       else
>>>>>>>>>         goto <bb 7>; [67.00%]
>>>>>>>>>
>>>>>>>>> -  <bb 6> [local count: 315357972]:
>>>>>>>>> +  <bb 6> [local count: 104068130]:
>>>>>>>>>       _2 = do_something ();
>>>>>>>>>       ga = _2;
>>>>>>>>>
>>>>>>>>> -  <bb 7> [local count: 955630225]:
>>>>>>>>> +  <bb 7> [local count: 315357973]:
>>>>>>>>>       # prephitmp_5 = PHI <prephitmp_12(5), _2(6)>
>>>>>>>>>       i_10 = inc (i_13);
>>>>>>>>>       if (n_7(D) > i_10)
>>>>>>>>>         goto <bb 21>; [89.00%]
>>>>>>>>>       else
>>>>>>>>>         goto <bb 11>; [11.00%]
>>>>>>>>>
>>>>>>>>>       <bb 11> [local count: 105119324]:
>>>>>>>>>       goto <bb 3>; [100.00%]
>>>>>>>>>
>>>>>>>>> -  <bb 21> [local count: 850510901]:
>>>>>>>>> +  <bb 21> [local count: 280668596]:
>>>>>>>>>       if (prephitmp_12 != 0)
>>>>>>>>> -    goto <bb 20>; [100.00%]
>>>>>>>>> +    goto <bb 20>; [33.00%]
>>>>>>>>>       else
>>>>>>>>> -    goto <bb 19>; [INV]
>>>>>>>>> +    goto <bb 19>; [67.00%]
>>>>>>>>>
>>>>>>>>> -  <bb 20> [local count: 850510901]:
>>>>>>>>> +  <bb 20> [local count: 280668596]:
>>>>>>>>>       goto <bb 5>; [100.00%]
>>>>>>>>>
>>>>>>>>> -  <bb 19> [count: 0]:
>>>>>>>>> +  <bb 19> [local count: 70429947]:
>>>>>>>>>       # i_23 = PHI <i_10(21)>
>>>>>>>>>       # prephitmp_25 = PHI <prephitmp_5(21)>
>>>>>>>>>
>>>>>>>>> -  <bb 12> [local count: 955630225]:
>>>>>>>>> +  <bb 12> [local count: 640272252]:
>>>>>>>>>       # i_15 = PHI <i_23(19), i_22(16)>
>>>>>>>>>       # prephitmp_16 = PHI <prephitmp_25(19), prephitmp_16(16)>
>>>>>>>>>       i_22 = inc (i_15);
>>>>>>>>>       if (n_7(D) > i_22)
>>>>>>>>>         goto <bb 16>; [89.00%]
>>>>>>>>>       else
>>>>>>>>>         goto <bb 11>; [11.00%]
>>>>>>>>>
>>>>>>>>> -  <bb 16> [local count: 850510901]:
>>>>>>>>> +  <bb 16> [local count: 569842305]:
>>>>>>>>>       goto <bb 12>; [100.00%]
>>>>>>>>>
>>>>>>>>>     }
>>>>>>>>>
>>>>>>>>> gcc/ChangeLog:
>>>>>>>>>
>>>>>>>>>  * tree-ssa-loop-split.c (split_loop): Fix incorrect probability.
>>>>>>>>>  (do_split_loop_on_cond): Likewise.
>>>>>>>>> ---
>>>>>>>>>     gcc/tree-ssa-loop-split.c | 16 ++++++++--------
>>>>>>>>>     1 file changed, 8 insertions(+), 8 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
>>>>>>>>> index 3a09bbc39e5..8e5a7ded0f7 100644
>>>>>>>>> --- a/gcc/tree-ssa-loop-split.c
>>>>>>>>> +++ b/gcc/tree-ssa-loop-split.c
>>>>>>>>> @@ -583,10 +583,10 @@ split_loop (class loop *loop1)
>>>>>>>>>      basic_block cond_bb;
>>>>>>>
>>>>>>>         if (!initial_true)
>>>>>>> -         cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
>>>>>>> +         cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
>>>>>>> +
>>>>>>> +       edge true_edge = EDGE_SUCC (bbs[i], 0)->flags & EDGE_TRUE_VALUE
>>>>>>> +                          ? EDGE_SUCC (bbs[i], 0)
>>>>>>> +                          : EDGE_SUCC (bbs[i], 1);
>>>>>>>
>>>>>>>>>     
>>>>>>>>>       class loop *loop2 = loop_version (loop1, cond, &cond_bb,
>>>>>>>>> -                                        profile_probability::always
>>>>>>>>> (),
>>>>>>>>> -                                        profile_probability::always
>>>>>>>>> (),
>>>>>>>>> -                                        profile_probability::always
>>>>>>>>> (),
>>>>>>>>> -                                        profile_probability::always
>>>>>>>>> (),
>>>>>>>>> +                                        true_edge->probability,
>>>>>>>>> +
>>>>>>>>> true_edge->probability.invert (),
>>>>>>>>> +                                        true_edge->probability,
>>>>>>>>> +
>>>>>>>>> true_edge->probability.invert (),
>>>>>>>>>             true);
>>>>>>>>
>>>>>>>> there is no 'true_edge' variable at this point.
>>>>>>>
>>>>>>> Sorry, missed the above hunk when split the patch.
>>>>>>>
>>>>>>>>
>>>>>>>>>      gcc_assert (loop2);
>>>>>>>>>     @@ -1486,10 +1486,10 @@ do_split_loop_on_cond (struct loop *loop1,
>>>>>>>>> edge invar_branch)
>>>>>>>>>       initialize_original_copy_tables ();
>>>>>>>>>     
>>>>>>>>>       struct loop *loop2 = loop_version (loop1, boolean_true_node,
>>>>>>>>> NULL,
>>>>>>>>> -                                  profile_probability::always (),
>>>>>>>>> -                                  profile_probability::never (),
>>>>>>>>> -                                  profile_probability::always (),
>>>>>>>>> -                                  profile_probability::always (),
>>>>>>>>> +                                  invar_branch->probability.invert
>>>>>>>>> (),
>>>>>>>>> +                                  invar_branch->probability,
>>>>>>>>> +                                  invar_branch->probability.invert
>>>>>>>>> (),
>>>>>>>>> +                                  invar_branch->probability,
>>>>>>>>>                                            true);
>>>>>>>>>       if (!loop2)
>>>>>>>>>         {
>>>>>>>>
>>>>>>>> The patch introduction seems to talk about do_split_loop_on_cond only.
>>>>>>>
>>>>>>> split_loop faces similar issue though it sets the two branches to 100% 
>>>>>>> vs
>>>>>>> 100%
>>>>>>> and no scaling which seems also incorrect.
>>>>>>>
>>>>>>>> Since loop versioning inserts a condition with the passed probabilities
>>>>>>>> but in this case a 'boolean_true_node' condition the then and else
>>>>>>>> probabilities passed look correct.  It's just the scaling arguments
>>>>>>>> that look wrong?  This loop_version call should get a comment as to
>>>>>>>> why we are passing probabilities the way we do.
>>>>>>>
>>>>>>> This optimization is transforming from:
>>>>>>>
>>>>>>> for (i = 0; i < n; i = inc (i))
>>>>>>>       {
>>>>>>>         if (ga)
>>>>>>>           ga = do_something ();
>>>>>>> }
>>>>>>>
>>>>>>> to:
>>>>>>>
>>>>>>>     for (i = 0; i < x; i = inc (i))
>>>>>>> {
>>>>>>>       if (true)
>>>>>>>           ga = do_something ();
>>>>>>>           if (!ga)
>>>>>>>             break;
>>>>>>> }
>>>>>>>     for (; i < n; i = inc (i))
>>>>>>> {
>>>>>>>       if (false)
>>>>>>>            ga = do_something ();
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> `boolean_true_node` is passed in to make the first loop's condition
>>>>>>> statement to be 'true', after returning from loop_version, there is a
>>>>>>> piece of code forcing the condition in second loop to be false,
>>>>>>> and the original condition is moved from *in loop* to *exit edge*
>>>>>>> between loop1 and loop2.
>>>>>>
>>>>>> Yes, one complication is that we use loop_version but do not retain
>>>>>> the CFG it creates.  Something like the vectorizers
>>>>>> slpeel_tree_duplicate_loop_to_edge_cfg would be a better "fit"
>>>>>> but then that code doesn't do any scaling yet.  But then
>>>>>> loop_version uses cfg_hook_duplicate_loop_to_header_edge and I suppose
>>>>>> we could write a variant that simply doesn't mangle the CFG
>>>>>> with a new condition switching between both loops but keeps them
>>>>>> executed after each other ...
>>>>>>
>>>>>> As said, this adds to the confusion and some awkwardness.
>>>>>
>>>>> Then loop_version in loop split requires two types of variant, one
>>>>> is to insert condition to loop preheader for 'split_loops' usage,
>>>>> another is to insert condition to loop exit for 'do_split_loop_on_cond'
>>>>> usage, it needs one extra function to encapsulate these cfg alterations
>>>>> from outside to inside.
>>>>>
>>>>> unswitching only execute one loop as it only moves the invariant condition
>>>>> to first loop's pre-header.  While 'split_loops' and
>>>>> 'do_split_loop_on_cond'
>>>>> may execute both loops no matter the condition is moved to the first 
>>>>> loop's
>>>>> preheader or exit.
>>>>>
>>>>> The condition stmt in loop unswitching is invariant, but it is variant
>>>>> in loop splitting, that's why loop unswitching execute only one loop
>>>>> but loop splitting executes both loops.
>>>>>
>>>>> Seems we need two condition arguments for loop_version, one for connecting
>>>>> loop1 preheader to loop2 preheader, another one for connecting loop1's 
>>>>> exit
>>>>> to loop2's header?  Then it will be more generic for both unswitching pass
>>>>> and splitting pass.  Looks a bit complicated and conflicted with
>>>>> loop_version's
>>>>> comments:
>>>>>
>>>>> /* Main entry point for Loop Versioning transformation.
>>>>>
>>>>>     This transformation given a condition and a loop, creates
>>>>>     -if (condition) { loop_copy1 } else { loop_copy2 }, ... */
>>>>>
>>>>>
>>>>> And this only works for loop split usage, those many other places
>>>>> doesn't use loop_version like this?
>>>>
>>>> Yes, as said if you don't want the above CFG then you probably
>>>> shouldn't use loop_version but instead use its building blocks
>>>> (and some refactoring of loop_version can make that easier).
>>>>
>>>> I think splitting wants
>>>>
>>>>    loop_copy1
>>>>    if (condition)
>>>>      loop_copy2
>>>>
>>>> IMHO it would be good to split 'loopify' into the actual loopification
>>>> and the scaling.  Basically make the part of loop_version that
>>>> copies the loop on the header edge and creates a loop structure for
>>>> it separate.
>>>>
>>>> loop distribution uses slpeel_tree_duplicate_loop_to_edge_cfg
>>>> (copy_loop_before).
>>>>
>>>
>>> Unfortunately slpeel_tree_duplicate_loop_to_edge_cfg only supports
>>> copying loops with single exit, it would cause many ICEs in it even
>>> building GCC stage1 (line 1065, line 1184 due to exit or new_exit
>>> is NULL returning from single_exit (loop).).  Seems loop_version is
>>> more flexible for loop split.
>>
>> Hmm, sure - loop_version does not need to do anything special with
>> exits since control flow either enters the original or the loop copy.
>>
>> But slpeel_tree_duplicate_loop_to_edge_cfg intends to create
>> control flow that enters _both_ loops, so it needs to have
>> an edge from the first loop exit to the second loop entry.
>>
>> One could extend this to a region copy, copying eventual CFG merges
>> of exits and specifying which exit of a SEME region is supposed
>> to be connected to the original region entry.
>>
>> After all that's what loop splitting needs in the end - though not
>> sure what exactly it does with more than one exit.
> 
> In tree-ssa-loop-split.c,  split_loop only accepts single exit loop,
> the recently added split_loop_on_cond could process multiple exits loop.
> 
> For example, do some changes to the loop-cond-split-1.c,
> 
> int ga;
> extern int a;
> extern int b;
> extern int c;
> 
> void test1 (int n) {
>   int i;
>   for (i = 0; i < n; i = inc (i)). {
>       if (a+3 > 0)
>        break;
> 
>       if (ga)
>         ga = do_something ();
> 
>       for (int j = 0; j < n; j++)
>         {
>            b+=5;
>            if (b > c) break;
>         }
>     }
> }
> 
> the "if (ga)" will be a new exit edge from loop_copy1 to loop_copy2. 
> I am not sure whether it is valuable to do semi-invariant loop split to such
> cases with multiple exits, but obviously the split_loop_on_cond is a special
> case from split_loop both duplicate loop to 
>    if (condition1) {loop_copy1} if (condition2) {loop_copy2}
> The only difference is condition1 is true for split_loop_on_cond.
> 
>>
>> So there's another set of "loop" copying, gimple_duplicate_sese_region,
>> which doesn't actually require a single exit but a single "important"
>> exit.  That might match how you treat multiple exits.
> 
> gimple_duplicate_sese_region doesn't handle subloops and latches.  Finally,
> I found that loop_version is still much better
> than slpeel_tree_duplicate_loop_to_edge_cfg and gimple_duplicate_sese_region
> since it could handle all cases like multiple exits/subloops, etc.  I did some
> refactor to the code to introduce loop_version2 to create duplicated loops
> with two input conditions as attached patch, is this reasonable enough?
> 
> I also tried to copy the code in loop_version out of it to don't call 
> loop_version
> in loop_version2, but it seems useless with many duplicate code and NOT get 
> rid
> of creating "if (condition1) {loop_copy1}" at first?
> 
> 

The previous attached patch 
0001-Add-loop_version2-to-support-loop-transformation-wit.patch
had issues when connecting two loops, reason is split_loop connect_loops at 
*exit* edge,
do_split_loop_on_cond connect_loops at latch_edge.  So they have different CFGs 
even
both with two conditions :(.  It seems not so that useful to also define 
another new function
'connect_loops_at_latch' given the limited usage in loop split only?  And the 
loop_version2
also looks not so general again ...


-- 
Thanks,
Xionghu

Reply via email to