On 2021/12/13 18:24, Jan Hubicka wrote:
>>> gcc/ChangeLog:
>>>
>>>     * loop-invariant.c (find_invariants_bb): Check profile count
>>>     before motion.
>>>     (find_invariants_body): Add argument.
>>> ---
>>>  gcc/loop-invariant.c | 10 +++++++---
>>>  1 file changed, 7 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
>>> index 5eee2e5c9f8..c61c8612fae 100644
>>> --- a/gcc/loop-invariant.c
>>> +++ b/gcc/loop-invariant.c
>>> @@ -1183,9 +1183,14 @@ find_invariants_insn (rtx_insn *insn, bool 
>>> always_reached, bool always_executed)
>>>     call.  */
>>>  
>>>  static void
>>> -find_invariants_bb (basic_block bb, bool always_reached, bool 
>>> always_executed)
>>> +find_invariants_bb (class loop *loop, basic_block bb, bool always_reached,
>>> +               bool always_executed)
>>>  {
>>>    rtx_insn *insn;
>>> +  basic_block preheader = loop_preheader_edge (loop)->src;
>>> +
>>> +  if (preheader->count > bb->count)
>>> +    return;
>>
>> Please add a comment explaining the conditional and if possible also a
>> testcase.  Since profile updating and use is sensitive topic and it may
>> trigger regressions later, it is important to keep track of info why
>> given tests was added.
 
OK. Comments like?

/* Don't move insn of cold BB out of loop to preheader to reduce calculations
   and register live range in hot loop with cold BB.  */


And maybe some dump log will help tracking in xxx.c.271r.loop2_invariant.

--- a/gcc/loop-invariant.c
+++ b/gcc/loop-invariant.c
@@ -1190,7 +1190,12 @@ find_invariants_bb (class loop *loop, basic_block bb, 
bool always_reached,
   basic_block preheader = loop_preheader_edge (loop)->src;

   if (preheader->count > bb->count)
-    return;
+    {
+      if (dump_file)
+       fprintf (dump_file, "Don't move invariant from bb: %d in loop %d\n",
+                bb->index, loop->num);
+      return;
+    }


This case could reflect the patch's effect:


gcc/testsuite/gcc.dg/loop-invariant-2.c
/* { dg-do compile } */
/* { dg-options "-O2 -fdump-rtl-loop2_invariant" } */

volatile int x;
void
bar (int, char *, char *);
void
foo (int *a, int n, int k)
{
  int i;

  for (i = 0; i < n; i++)
    {
      if (__builtin_expect (x, 0))
        bar (k / 5, "one", "two");
      a[i] = k;
    }
}

/* { dg-final { scan-rtl-dump-not "Decided to move invariant" "loop2_invariant" 
} } */


insn 27,28,29 was hoisted out of loop, with the count test patch, they are kept 
in
loop body.

 diff -U15 base/ssa-lim-23.c.271r.loop2_invariant 
patched/ssa-lim-23.c.271r.loop2_invariant

 *****ending processing of loop 1 ******
-Set in insn 27 is invariant (0), cost 16, depends on
-Set in insn 28 is invariant (1), cost 16, depends on
-Set in insn 29 is invariant (2), cost 8, depends on
-Set in insn 30 is invariant (3), cost 0, depends on 0
-Set in insn 31 is invariant (4), cost 0, depends on 1
-Set in insn 32 is invariant (5), cost 0, depends on 2
-Decided to move invariant 0 -- gain 16
-Decided to move invariant 1 -- gain 16
-Decided to move invariant 2 -- gain 8
-deferring rescan insn with uid = 27.
-deferring rescan insn with uid = 30.
-deferring rescan insn with uid = 61.
-changing bb of uid 27
-  from 5 to 3
-deferring rescan insn with uid = 28.
-deferring rescan insn with uid = 31.
-deferring rescan insn with uid = 62.
-changing bb of uid 28
-  from 5 to 3
-deferring rescan insn with uid = 29.
-deferring rescan insn with uid = 32.
-deferring rescan insn with uid = 63.
-changing bb of uid 29
-  from 5 to 3
 starting the processing of deferred insns
-rescanning insn with uid = 27.
-rescanning insn with uid = 28.
-rescanning insn with uid = 29.
-rescanning insn with uid = 30.
-rescanning insn with uid = 31.
-rescanning insn with uid = 32.
-rescanning insn with uid = 61.
-rescanning insn with uid = 62.
-rescanning insn with uid = 63.
 ending the processing of deferred insns
 starting the processing of deferred insns
 ending the processing of deferred insns

...

    55: r138:DI=unspec[`*.LANCHOR0',%2:DI] 47
       REG_EQUAL `*.LANCHOR0'
-   27: r139:DI=unspec[`*.LC0',%2:DI] 47
-   28: r140:DI=unspec[`*.LC1',%2:DI] 47
-   29: r141:DI=sign_extend(r118:SI)
    39: L39:
    21: NOTE_INSN_BASIC_BLOCK 4
    23: r117:SI=[r138:DI]
    24: r133:CC=cmp(r117:SI,0)
       REG_DEAD r117:SI
    25: pc={(r133:CC==0)?L34:pc}
       REG_DEAD r133:CC
       REG_BR_PROB 966367644
    26: NOTE_INSN_BASIC_BLOCK 5
-   61: r134:DI=r139:DI
-   62: r135:DI=r140:DI
-   63: r136:DI=r141:DI
-   30: %5:DI=r139:DI
+   27: r134:DI=unspec[`*.LC0',%2:DI] 47
+      REG_EQUAL `*.LC0'
+   28: r135:DI=unspec[`*.LC1',%2:DI] 47
+      REG_EQUAL `*.LC1'
+   29: r136:DI=sign_extend(r118:SI)
+   30: %5:DI=r134:DI
       REG_DEAD r134:DI
       REG_EQUAL `*.LC0'
-   31: %4:DI=r140:DI
+   31: %4:DI=r135:DI
       REG_DEAD r135:DI
       REG_EQUAL `*.LC1'
-   32: %3:DI=r141:DI
+   32: %3:DI=r136:DI
       REG_DEAD r136:DI
    33: call [`bar'] argc:0
       REG_DEAD %5:DI
       REG_DEAD %4:DI
       REG_DEAD %3:DI
       REG_CALL_DECL `bar'
    34: L34:
    35: NOTE_INSN_BASIC_BLOCK 6

>>
>> I wonder why the cost model chose to move any invariatns to preheader
>> why preheader->count > bb->count?
> 
> Thinking about this more, you want to test:
>   if (!always_executed && preheader->count > bb->count)
>     return;
> This will rule out some profile misupates.
> 
> Also the code updating always_reached is worng:
>       if (always_reached
>           && CALL_P (insn)
>           && (RTL_LOOPING_CONST_OR_PURE_CALL_P (insn)
>               || ! RTL_CONST_OR_PURE_CALL_P (insn)))
>   always_reached = false;
> It misses the case where statement can trhow external exception or
> volatile asms that can also terminate execution.
> 
> I bit worry on how often the test will have false positives with guessed
> profile where earlier loop optimizations reduced trip count below
> realistic estimate.

Sorry I don't understand why always_executed and always_reached matters here?
the test is based on BB before the FOR_BB_INSNS loop...

> 
> Honza
>>
>> Honza
>>>  
>>>    FOR_BB_INSNS (bb, insn)
>>>      {
>>> @@ -1214,8 +1219,7 @@ find_invariants_body (class loop *loop, basic_block 
>>> *body,
>>>    unsigned i;
>>>  
>>>    for (i = 0; i < loop->num_nodes; i++)
>>> -    find_invariants_bb (body[i],
>>> -                   bitmap_bit_p (always_reached, i),
>>> +    find_invariants_bb (loop, body[i], bitmap_bit_p (always_reached, i),
>>>                     bitmap_bit_p (always_executed, i));
>>>  }
>>>  
>>> -- 
>>> 2.25.1
>>>

-- 
Thanks,
Xionghu

Reply via email to