On 2021/12/14 17:27, Xionghu Luo via Gcc-patches wrote:
>
>
> On 2021/12/13 17:25, Jan Hubicka wrote:
>>> r12-4526 cancelled jump thread path rotates loop. It exposes a issue in
>>> profile-estimate when predict_extra_loop_exits, outer loop's exit edge
>>> is marked as inner loop's extra loop exit and set with incorrect
>>> prediction, then a hot inner loop will become cold loop finally through
>>> optimizations, this patch add loop check when searching extra exit edges
>>> to avoid unexpected predict_edge from predict_paths_for_bb.
>>>
>>> Regression tested on P8LE, OK for master?
>>>
>>> gcc/ChangeLog:
>>>
>>> PR middle-end/103270
>>> * predict.c (predict_extra_loop_exits): Add loop parameter.
>>> (predict_loops): Call with loop argument.
>>
>> With changes to branch predictors it is useful to re-test their
>> effectivity on spec and see if their hitrates are still mathcing
>> reality. You can do it by buiding spec with -fprofile-generate, train
>> it and then build with -fprofile-use -fdump-tree-ipa-profile-details
>> and use contrib/analyze_brprob.py that will collect info on how they
>> work.
>>
>> This patch looks good to me, but it would be nice to have things reality
>> checked (and since we did not do the stats for some time, there may be
>> surprises) so if you could run the specs and post results of
>> analyze_brprob, it would be great. I will also try to get to that soon,
>> but currently I am bit swamped by other problems I noticed on clang
>> builds.
>>
>> Thanks a lot for working on profile fixes - I am trying now to get
>> things into shape. With Martin we added basic testing infrastructure
>> for keeping track of profile updates and I am trying to see how it works
>> in practice now. Hopefully it will make it easier to judge on profile
>> updating patches. I would welcome list of patches I should look at.
>>
>> I will write separate mail on this.
>> Honza
>
>
> With the patch, the analyze_brprob.py outputs below data with PGO build,
> there is no verification code in the script, so how to check whether it
> is correct? Run it again without the patch and compare "extra loop exit"
> field?
>
>
> ./contrib/analyze_brprob.py ~/workspace/tests/spec2017/dump_file_all
> HEURISTICS BRANCHES (REL) BR. HITRATE
> HITRATE COVERAGE COVERAGE (REL) predict.def (REL) HOT branches
> (>10%)
> noreturn call 1 0.0% 100.00%
> 50.00% / 50.00% 2 2.00 0.0% 100%:1
> Fortran zero-sized array 3 0.0% 66.67%
> 41.71% / 60.50% 362 362.00 0.0% 100%:3
> loop iv compare 16 0.0% 93.75%
> 98.26% / 98.76% 279847 279.85k 0.0% 93%:4
> __builtin_expect 35 0.0% 97.14%
> 78.09% / 78.35% 17079558 17.08M 0.0%
> loop guard with recursion 45 0.1% 86.67%
> 85.13% / 85.14% 6722424412 6.72G 1.3% 74%:4
> extra loop exit 80 0.1% 58.75%
> 81.49% / 89.21% 438470261 438.47M 0.1% 86%:3
> guess loop iv compare 235 0.3% 80.85%
> 52.83% / 73.97% 148558247 148.56M 0.0% 47%:3
> negative return 241 0.3% 71.37%
> 25.33% / 92.61% 250402383 250.40M 0.0% 69%:2
> loop exit with recursion 315 0.4% 74.60%
> 85.07% / 85.71% 9403136858 9.40G 1.8% 59%:4
> const return 320 0.4% 51.88%
> 90.45% / 95.63% 925341727 925.34M 0.2% 76%:5
> indirect call 377 0.5% 51.46%
> 84.72% / 91.14% 2133772848 2.13G 0.4% 69%:1
> polymorphic call 410 0.5% 44.15%
> 31.26% / 79.37% 3272688244 3.27G 0.6% 53%:2
> recursive call 506 0.7% 39.53%
> 44.97% / 83.92% 1211036806 1.21G 0.2% 10%:1
> goto 618 0.8% 64.24%
> 65.37% / 83.57% 702446178 702.45M 0.1% 20%:1
> null return 800 1.1% 64.62%
> 56.59% / 77.70% 603952067 603.95M 0.1% 28%:2
> continue 956 1.3% 63.70%
> 65.65% / 79.97% 3780303799 3.78G 0.7% 52%:3
> loop guard 1177 1.6% 56.33%
> 42.54% / 80.32% 7373601457 7.37G 1.4% 50%:2
> opcode values positive (on trees) 2020 2.7% 62.38%
> 64.16% / 84.44% 31695571761 31.70G 6.0% 21%:2
> loop exit 3293 4.4% 76.19%
> 87.18% / 88.35% 50377138963 50.38G 9.6% 18%:1
> loop iterations 4761 6.3% 99.98%
> 84.27% / 84.27% 73463634555 73.46G 13.9%
> pointer (on trees) 8076 10.7% 56.23%
> 69.36% / 83.15% 12322099991 12.32G 2.3%
> call 11396 15.1% 64.14%
> 74.13% / 89.82% 25197949198 25.20G 4.8% 34%:1
> opcode values nonequal (on trees) 12237 16.3% 70.70%
> 70.86% / 83.54% 36638772333 36.64G 6.9%
> guessed loop iterations 16760 22.3% 99.78%
> 91.49% / 91.49% 162952747918 162.95G 30.9%
>
> HEURISTICS BRANCHES (REL) BR. HITRATE
> HITRATE COVERAGE COVERAGE (REL) predict.def (REL) HOT branches
> (>10%)
> no prediction 12730 16.9% 39.29%
> 33.32% / 79.93% 121106031835 121.11G 23.0%
> first match 25261 33.6% 92.17%
> 88.33% / 88.98% 296652487962 296.65G 56.3%
> DS theory 28333 37.7% 63.03%
> 72.05% / 85.00% 109563734005 109.56G 20.8%
> combined 75232 100.0% 73.17%
> 72.32% / 86.08% 527351738575 527.35G 100.0%
>
> Loop count: 37870
> avg. # of iter: 8444.77
> median # of iter: 7.00
> avg. (1% cutoff) # of iter: 174.68
> avg. (5% cutoff) # of iter: 55.14
> avg. (10% cutoff) # of iter: 35.21
> avg. (20% cutoff) # of iter: 26.23
> avg. (30% cutoff) # of iter: 21.70
This is the output data collected without the patch, as can be seen, no
difference on "extra loop exit".
But this issue should be fixed.
./contrib/analyze_brprob_spec.py ~/workspace/tests/spec2017/
benchspec
HEURISTICS BRANCHES (REL) BR. HITRATE
HITRATE COVERAGE COVERAGE (REL) predict.def (REL) HOT branches (>10%)
noreturn call 1 0.0% 100.00% 50.00%
/ 50.00% 2 2.00 0.0% 100%:1
Fortran zero-sized array 3 0.0% 66.67% 41.71%
/ 60.50% 362 362.00 0.0% 100%:3
loop iv compare 16 0.0% 93.75% 98.26%
/ 98.76% 279847 279.85k 0.0% 93%:4
__builtin_expect 35 0.0% 97.14% 78.09%
/ 78.35% 17079558 17.08M 0.0%
loop guard with recursion 45 0.1% 86.67% 85.13%
/ 85.14% 6722424412 6.72G 1.3% 74%:4
extra loop exit 80 0.1% 58.75% 81.49%
/ 89.21% 438470261 438.47M 0.1% 86%:3
guess loop iv compare 235 0.3% 80.85% 52.83%
/ 73.97% 148558247 148.56M 0.0% 47%:3
negative return 241 0.3% 71.37% 25.33%
/ 92.61% 250402383 250.40M 0.0% 69%:2
loop exit with recursion 315 0.4% 74.60% 85.07%
/ 85.71% 9403136858 9.40G 1.8% 59%:4
const return 320 0.4% 51.88% 90.45%
/ 95.63% 925341727 925.34M 0.2% 76%:5
indirect call 377 0.5% 51.46% 84.72%
/ 91.14% 2133772848 2.13G 0.4% 69%:1
polymorphic call 410 0.5% 44.15% 31.26%
/ 79.37% 3272688238 3.27G 0.6% 53%:2
recursive call 506 0.7% 39.53% 44.97%
/ 83.92% 1211036806 1.21G 0.2% 10%:1
goto 618 0.8% 64.24% 65.37%
/ 83.57% 702446178 702.45M 0.1% 20%:1
null return 800 1.1% 64.62% 56.59%
/ 77.70% 603952067 603.95M 0.1% 28%:2
continue 956 1.3% 63.70% 65.65%
/ 79.97% 3780303795 3.78G 0.7% 52%:3
loop guard 1178 1.6% 56.37% 42.54%
/ 80.32% 7373601533 7.37G 1.4% 50%:2
opcode values positive (on trees) 2020 2.7% 62.38% 64.16%
/ 84.44% 31695571761 31.70G 5.9% 21%:2
loop exit 3293 4.4% 76.19% 87.18%
/ 88.35% 50377138963 50.38G 9.4% 18%:1
loop iterations 4772 6.3% 99.98% 84.27%
/ 84.27% 74045982111 74.05G 13.8%
pointer (on trees) 8076 10.7% 56.23% 69.36%
/ 83.15% 12322099991 12.32G 2.3%
call 11396 15.1% 64.14% 74.13%
/ 89.82% 25197949198 25.20G 4.7% 34%:1
opcode values nonequal (on trees) 12240 16.2% 70.71% 70.86%
/ 83.54% 36638772682 36.64G 6.9%
guessed loop iterations 16854 22.4% 99.78% 91.21%
/ 91.22% 169765264401 169.77G 31.7%
HEURISTICS BRANCHES (REL) BR. HITRATE
HITRATE COVERAGE COVERAGE (REL) predict.def (REL) HOT branches (>10%)
no prediction 12731 16.9% 39.30% 33.32%
/ 79.93% 121106031963 121.11G 22.6%
first match 25366 33.7% 92.20% 88.24%
/ 88.88% 304047352001 304.05G 56.9%
DS theory 28337 37.6% 63.03% 72.05%
/ 85.00% 109563734430 109.56G 20.5%
combined 75342 100.0% 73.21% 72.49%
/ 86.06% 534746603167 534.75G 100.0%
Loop count: 38058
avg. # of iter: 8403.32
median # of iter: 7.00
avg. (1% cutoff) # of iter: 173.72
avg. (5% cutoff) # of iter: 54.90
avg. (10% cutoff) # of iter: 35.20
avg. (20% cutoff) # of iter: 26.35
avg. (30% cutoff) # of iter: 21.87
--
Thanks,
Xionghu