On 2021/12/13 18:24, Jan Hubicka wrote:
>>> gcc/ChangeLog:
>>>
>>> * loop-invariant.c (find_invariants_bb): Check profile count
>>> before motion.
>>> (find_invariants_body): Add argument.
>>> ---
>>> gcc/loop-invariant.c | 10 +++++++---
>>> 1 file changed, 7 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
>>> index 5eee2e5c9f8..c61c8612fae 100644
>>> --- a/gcc/loop-invariant.c
>>> +++ b/gcc/loop-invariant.c
>>> @@ -1183,9 +1183,14 @@ find_invariants_insn (rtx_insn *insn, bool
>>> always_reached, bool always_executed)
>>> call. */
>>>
>>> static void
>>> -find_invariants_bb (basic_block bb, bool always_reached, bool
>>> always_executed)
>>> +find_invariants_bb (class loop *loop, basic_block bb, bool always_reached,
>>> + bool always_executed)
>>> {
>>> rtx_insn *insn;
>>> + basic_block preheader = loop_preheader_edge (loop)->src;
>>> +
>>> + if (preheader->count > bb->count)
>>> + return;
>>
>> Please add a comment explaining the conditional and if possible also a
>> testcase. Since profile updating and use is sensitive topic and it may
>> trigger regressions later, it is important to keep track of info why
>> given tests was added.
OK. Comments like?
/* Don't move insn of cold BB out of loop to preheader to reduce calculations
and register live range in hot loop with cold BB. */
And maybe some dump log will help tracking in xxx.c.271r.loop2_invariant.
--- a/gcc/loop-invariant.c
+++ b/gcc/loop-invariant.c
@@ -1190,7 +1190,12 @@ find_invariants_bb (class loop *loop, basic_block bb,
bool always_reached,
basic_block preheader = loop_preheader_edge (loop)->src;
if (preheader->count > bb->count)
- return;
+ {
+ if (dump_file)
+ fprintf (dump_file, "Don't move invariant from bb: %d in loop %d\n",
+ bb->index, loop->num);
+ return;
+ }
This case could reflect the patch's effect:
gcc/testsuite/gcc.dg/loop-invariant-2.c
/* { dg-do compile } */
/* { dg-options "-O2 -fdump-rtl-loop2_invariant" } */
volatile int x;
void
bar (int, char *, char *);
void
foo (int *a, int n, int k)
{
int i;
for (i = 0; i < n; i++)
{
if (__builtin_expect (x, 0))
bar (k / 5, "one", "two");
a[i] = k;
}
}
/* { dg-final { scan-rtl-dump-not "Decided to move invariant" "loop2_invariant"
} } */
insn 27,28,29 was hoisted out of loop, with the count test patch, they are kept
in
loop body.
diff -U15 base/ssa-lim-23.c.271r.loop2_invariant
patched/ssa-lim-23.c.271r.loop2_invariant
*****ending processing of loop 1 ******
-Set in insn 27 is invariant (0), cost 16, depends on
-Set in insn 28 is invariant (1), cost 16, depends on
-Set in insn 29 is invariant (2), cost 8, depends on
-Set in insn 30 is invariant (3), cost 0, depends on 0
-Set in insn 31 is invariant (4), cost 0, depends on 1
-Set in insn 32 is invariant (5), cost 0, depends on 2
-Decided to move invariant 0 -- gain 16
-Decided to move invariant 1 -- gain 16
-Decided to move invariant 2 -- gain 8
-deferring rescan insn with uid = 27.
-deferring rescan insn with uid = 30.
-deferring rescan insn with uid = 61.
-changing bb of uid 27
- from 5 to 3
-deferring rescan insn with uid = 28.
-deferring rescan insn with uid = 31.
-deferring rescan insn with uid = 62.
-changing bb of uid 28
- from 5 to 3
-deferring rescan insn with uid = 29.
-deferring rescan insn with uid = 32.
-deferring rescan insn with uid = 63.
-changing bb of uid 29
- from 5 to 3
starting the processing of deferred insns
-rescanning insn with uid = 27.
-rescanning insn with uid = 28.
-rescanning insn with uid = 29.
-rescanning insn with uid = 30.
-rescanning insn with uid = 31.
-rescanning insn with uid = 32.
-rescanning insn with uid = 61.
-rescanning insn with uid = 62.
-rescanning insn with uid = 63.
ending the processing of deferred insns
starting the processing of deferred insns
ending the processing of deferred insns
...
55: r138:DI=unspec[`*.LANCHOR0',%2:DI] 47
REG_EQUAL `*.LANCHOR0'
- 27: r139:DI=unspec[`*.LC0',%2:DI] 47
- 28: r140:DI=unspec[`*.LC1',%2:DI] 47
- 29: r141:DI=sign_extend(r118:SI)
39: L39:
21: NOTE_INSN_BASIC_BLOCK 4
23: r117:SI=[r138:DI]
24: r133:CC=cmp(r117:SI,0)
REG_DEAD r117:SI
25: pc={(r133:CC==0)?L34:pc}
REG_DEAD r133:CC
REG_BR_PROB 966367644
26: NOTE_INSN_BASIC_BLOCK 5
- 61: r134:DI=r139:DI
- 62: r135:DI=r140:DI
- 63: r136:DI=r141:DI
- 30: %5:DI=r139:DI
+ 27: r134:DI=unspec[`*.LC0',%2:DI] 47
+ REG_EQUAL `*.LC0'
+ 28: r135:DI=unspec[`*.LC1',%2:DI] 47
+ REG_EQUAL `*.LC1'
+ 29: r136:DI=sign_extend(r118:SI)
+ 30: %5:DI=r134:DI
REG_DEAD r134:DI
REG_EQUAL `*.LC0'
- 31: %4:DI=r140:DI
+ 31: %4:DI=r135:DI
REG_DEAD r135:DI
REG_EQUAL `*.LC1'
- 32: %3:DI=r141:DI
+ 32: %3:DI=r136:DI
REG_DEAD r136:DI
33: call [`bar'] argc:0
REG_DEAD %5:DI
REG_DEAD %4:DI
REG_DEAD %3:DI
REG_CALL_DECL `bar'
34: L34:
35: NOTE_INSN_BASIC_BLOCK 6
>>
>> I wonder why the cost model chose to move any invariatns to preheader
>> why preheader->count > bb->count?
>
> Thinking about this more, you want to test:
> if (!always_executed && preheader->count > bb->count)
> return;
> This will rule out some profile misupates.
>
> Also the code updating always_reached is worng:
> if (always_reached
> && CALL_P (insn)
> && (RTL_LOOPING_CONST_OR_PURE_CALL_P (insn)
> || ! RTL_CONST_OR_PURE_CALL_P (insn)))
> always_reached = false;
> It misses the case where statement can trhow external exception or
> volatile asms that can also terminate execution.
>
> I bit worry on how often the test will have false positives with guessed
> profile where earlier loop optimizations reduced trip count below
> realistic estimate.
Sorry I don't understand why always_executed and always_reached matters here?
the test is based on BB before the FOR_BB_INSNS loop...
>
> Honza
>>
>> Honza
>>>
>>> FOR_BB_INSNS (bb, insn)
>>> {
>>> @@ -1214,8 +1219,7 @@ find_invariants_body (class loop *loop, basic_block
>>> *body,
>>> unsigned i;
>>>
>>> for (i = 0; i < loop->num_nodes; i++)
>>> - find_invariants_bb (body[i],
>>> - bitmap_bit_p (always_reached, i),
>>> + find_invariants_bb (loop, body[i], bitmap_bit_p (always_reached, i),
>>> bitmap_bit_p (always_executed, i));
>>> }
>>>
>>> --
>>> 2.25.1
>>>
--
Thanks,
Xionghu