On 2021/12/14 17:27, Xionghu Luo via Gcc-patches wrote:
> 
> 
> On 2021/12/13 17:25, Jan Hubicka wrote:
>>> r12-4526 cancelled jump thread path rotates loop. It exposes a issue in
>>> profile-estimate when predict_extra_loop_exits, outer loop's exit edge
>>> is marked as inner loop's extra loop exit and set with incorrect
>>> prediction, then a hot inner loop will become cold loop finally through
>>> optimizations, this patch add loop check when searching extra exit edges
>>> to avoid unexpected predict_edge from predict_paths_for_bb.
>>>
>>> Regression tested on P8LE, OK for master?
>>>
>>> gcc/ChangeLog:
>>>
>>>     PR middle-end/103270
>>>     * predict.c (predict_extra_loop_exits): Add loop parameter.
>>>     (predict_loops): Call with loop argument.
>>
>> With changes to branch predictors it is useful to re-test their
>> effectivity on spec and see if their hitrates are still mathcing
>> reality.  You can do it by buiding spec with -fprofile-generate, train
>> it and then build with -fprofile-use -fdump-tree-ipa-profile-details
>> and use contrib/analyze_brprob.py that will collect info on how they
>> work.
>>
>> This patch looks good to me, but it would be nice to have things reality
>> checked (and since we did not do the stats for some time, there may be
>> surprises) so if you could run the specs and post results of
>> analyze_brprob, it would be great.  I will also try to get to that soon,
>> but currently I am bit swamped by other problems I noticed on clang
>> builds.
>>
>> Thanks a lot for working on profile fixes - I am trying now to get
>> things into shape.  With Martin we added basic testing infrastructure
>> for keeping track of profile updates and I am trying to see how it works
>> in practice now.  Hopefully it will make it easier to judge on profile
>> updating patches. I would welcome list of patches I should look at.
>>
>> I will write separate mail on this.
>> Honza
> 
> 
> With the patch, the analyze_brprob.py outputs below data with PGO build,
> there is no verification code in the script, so how to check whether it
> is correct?  Run it again without the patch and compare "extra loop exit"
> field?
> 
> 
> ./contrib/analyze_brprob.py ~/workspace/tests/spec2017/dump_file_all
> HEURISTICS                               BRANCHES  (REL)  BR. HITRATE         
>    HITRATE       COVERAGE COVERAGE  (REL)  predict.def  (REL) HOT branches 
> (>10%)
> noreturn call                                   1   0.0%      100.00%   
> 50.00% /  50.00%              2     2.00   0.0%                     100%:1
> Fortran zero-sized array                        3   0.0%       66.67%   
> 41.71% /  60.50%            362   362.00   0.0%                     100%:3
> loop iv compare                                16   0.0%       93.75%   
> 98.26% /  98.76%         279847  279.85k   0.0%                     93%:4
> __builtin_expect                               35   0.0%       97.14%   
> 78.09% /  78.35%       17079558   17.08M   0.0%
> loop guard with recursion                      45   0.1%       86.67%   
> 85.13% /  85.14%     6722424412    6.72G   1.3%                     74%:4
> extra loop exit                                80   0.1%       58.75%   
> 81.49% /  89.21%      438470261  438.47M   0.1%                     86%:3
> guess loop iv compare                         235   0.3%       80.85%   
> 52.83% /  73.97%      148558247  148.56M   0.0%                     47%:3
> negative return                               241   0.3%       71.37%   
> 25.33% /  92.61%      250402383  250.40M   0.0%                     69%:2
> loop exit with recursion                      315   0.4%       74.60%   
> 85.07% /  85.71%     9403136858    9.40G   1.8%                     59%:4
> const return                                  320   0.4%       51.88%   
> 90.45% /  95.63%      925341727  925.34M   0.2%                     76%:5
> indirect call                                 377   0.5%       51.46%   
> 84.72% /  91.14%     2133772848    2.13G   0.4%                     69%:1
> polymorphic call                              410   0.5%       44.15%   
> 31.26% /  79.37%     3272688244    3.27G   0.6%                     53%:2
> recursive call                                506   0.7%       39.53%   
> 44.97% /  83.92%     1211036806    1.21G   0.2%                     10%:1
> goto                                          618   0.8%       64.24%   
> 65.37% /  83.57%      702446178  702.45M   0.1%                     20%:1
> null return                                   800   1.1%       64.62%   
> 56.59% /  77.70%      603952067  603.95M   0.1%                     28%:2
> continue                                      956   1.3%       63.70%   
> 65.65% /  79.97%     3780303799    3.78G   0.7%                     52%:3
> loop guard                                   1177   1.6%       56.33%   
> 42.54% /  80.32%     7373601457    7.37G   1.4%                     50%:2
> opcode values positive (on trees)            2020   2.7%       62.38%   
> 64.16% /  84.44%    31695571761   31.70G   6.0%                     21%:2
> loop exit                                    3293   4.4%       76.19%   
> 87.18% /  88.35%    50377138963   50.38G   9.6%                     18%:1
> loop iterations                              4761   6.3%       99.98%   
> 84.27% /  84.27%    73463634555   73.46G  13.9%
> pointer (on trees)                           8076  10.7%       56.23%   
> 69.36% /  83.15%    12322099991   12.32G   2.3%
> call                                        11396  15.1%       64.14%   
> 74.13% /  89.82%    25197949198   25.20G   4.8%                     34%:1
> opcode values nonequal (on trees)           12237  16.3%       70.70%   
> 70.86% /  83.54%    36638772333   36.64G   6.9%
> guessed loop iterations                     16760  22.3%       99.78%   
> 91.49% /  91.49%   162952747918  162.95G  30.9%
> 
> HEURISTICS                               BRANCHES  (REL)  BR. HITRATE         
>    HITRATE       COVERAGE COVERAGE  (REL)  predict.def  (REL) HOT branches 
> (>10%)
> no prediction                               12730  16.9%       39.29%   
> 33.32% /  79.93%   121106031835  121.11G  23.0%
> first match                                 25261  33.6%       92.17%   
> 88.33% /  88.98%   296652487962  296.65G  56.3%
> DS theory                                   28333  37.7%       63.03%   
> 72.05% /  85.00%   109563734005  109.56G  20.8%
> combined                                    75232 100.0%       73.17%   
> 72.32% /  86.08%   527351738575  527.35G 100.0%
> 
> Loop count: 37870
>   avg. # of iter: 8444.77
>   median # of iter: 7.00
>   avg. (1% cutoff) # of iter: 174.68
>   avg. (5% cutoff) # of iter: 55.14
>   avg. (10% cutoff) # of iter: 35.21
>   avg. (20% cutoff) # of iter: 26.23
>   avg. (30% cutoff) # of iter: 21.70

This is the output data collected without the patch, as can be seen, no 
difference on "extra loop exit".
But this issue should be fixed.


./contrib/analyze_brprob_spec.py ~/workspace/tests/spec2017/

benchspec
HEURISTICS                               BRANCHES  (REL)  BR. HITRATE           
 HITRATE       COVERAGE COVERAGE  (REL)  predict.def  (REL) HOT branches (>10%)
noreturn call                                   1   0.0%      100.00%   50.00% 
/  50.00%              2     2.00   0.0%                     100%:1
Fortran zero-sized array                        3   0.0%       66.67%   41.71% 
/  60.50%            362   362.00   0.0%                     100%:3
loop iv compare                                16   0.0%       93.75%   98.26% 
/  98.76%         279847  279.85k   0.0%                     93%:4
__builtin_expect                               35   0.0%       97.14%   78.09% 
/  78.35%       17079558   17.08M   0.0%
loop guard with recursion                      45   0.1%       86.67%   85.13% 
/  85.14%     6722424412    6.72G   1.3%                     74%:4
extra loop exit                                80   0.1%       58.75%   81.49% 
/  89.21%      438470261  438.47M   0.1%                     86%:3
guess loop iv compare                         235   0.3%       80.85%   52.83% 
/  73.97%      148558247  148.56M   0.0%                     47%:3
negative return                               241   0.3%       71.37%   25.33% 
/  92.61%      250402383  250.40M   0.0%                     69%:2
loop exit with recursion                      315   0.4%       74.60%   85.07% 
/  85.71%     9403136858    9.40G   1.8%                     59%:4
const return                                  320   0.4%       51.88%   90.45% 
/  95.63%      925341727  925.34M   0.2%                     76%:5
indirect call                                 377   0.5%       51.46%   84.72% 
/  91.14%     2133772848    2.13G   0.4%                     69%:1
polymorphic call                              410   0.5%       44.15%   31.26% 
/  79.37%     3272688238    3.27G   0.6%                     53%:2
recursive call                                506   0.7%       39.53%   44.97% 
/  83.92%     1211036806    1.21G   0.2%                     10%:1
goto                                          618   0.8%       64.24%   65.37% 
/  83.57%      702446178  702.45M   0.1%                     20%:1
null return                                   800   1.1%       64.62%   56.59% 
/  77.70%      603952067  603.95M   0.1%                     28%:2
continue                                      956   1.3%       63.70%   65.65% 
/  79.97%     3780303795    3.78G   0.7%                     52%:3
loop guard                                   1178   1.6%       56.37%   42.54% 
/  80.32%     7373601533    7.37G   1.4%                     50%:2
opcode values positive (on trees)            2020   2.7%       62.38%   64.16% 
/  84.44%    31695571761   31.70G   5.9%                     21%:2
loop exit                                    3293   4.4%       76.19%   87.18% 
/  88.35%    50377138963   50.38G   9.4%                     18%:1
loop iterations                              4772   6.3%       99.98%   84.27% 
/  84.27%    74045982111   74.05G  13.8%
pointer (on trees)                           8076  10.7%       56.23%   69.36% 
/  83.15%    12322099991   12.32G   2.3%
call                                        11396  15.1%       64.14%   74.13% 
/  89.82%    25197949198   25.20G   4.7%                     34%:1
opcode values nonequal (on trees)           12240  16.2%       70.71%   70.86% 
/  83.54%    36638772682   36.64G   6.9%
guessed loop iterations                     16854  22.4%       99.78%   91.21% 
/  91.22%   169765264401  169.77G  31.7%

HEURISTICS                               BRANCHES  (REL)  BR. HITRATE           
 HITRATE       COVERAGE COVERAGE  (REL)  predict.def  (REL) HOT branches (>10%)
no prediction                               12731  16.9%       39.30%   33.32% 
/  79.93%   121106031963  121.11G  22.6%
first match                                 25366  33.7%       92.20%   88.24% 
/  88.88%   304047352001  304.05G  56.9%
DS theory                                   28337  37.6%       63.03%   72.05% 
/  85.00%   109563734430  109.56G  20.5%
combined                                    75342 100.0%       73.21%   72.49% 
/  86.06%   534746603167  534.75G 100.0%

Loop count: 38058
  avg. # of iter: 8403.32
  median # of iter: 7.00
  avg. (1% cutoff) # of iter: 173.72
  avg. (5% cutoff) # of iter: 54.90
  avg. (10% cutoff) # of iter: 35.20
  avg. (20% cutoff) # of iter: 26.35
  avg. (30% cutoff) # of iter: 21.87


-- 
Thanks,
Xionghu

Reply via email to