https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92007

--- Comment #27 from Ilya Leoshkevich <iii at linux dot ibm.com> ---
With

-DSPEC_CPU -DNDEBUG -DPERL_CORE   -O3 -save-temps=obj -fopt-info-vec-optimized 
     -DSPEC_CPU_LP64 -DSPEC_CPU_LINUX_X64 -fgnu89-inline

on gcc113 I can see 2% slowdown:

r277511 (without this fix): 880.09s
r277515 (with this fix):    897.85s

The function that degraded the most is indeed S_regmatch:

$ perf diff perf-9760321.data perf-44b2b4c.data
    32.24%           exe                [.] S_regmatch                        
     8.92%           exe                [.] S_find_byclass.isra.0             
     6.80%   +0.28%  libc-2.19.so       [.] 0x000000000007dec0                
     5.20%           exe                [.] S_regtry                          

However, the "shape" of S_regmatch did not change, that is, when all
offsets and register numbers are replaced with "x" in the objdump
output, the old and the new versions are identical.  This hints at some
microarchitectural effect - aliasing in the branch predictor maybe?

From my perspective, this happens too often, so I use the following test
to rule this out: just add a nop at the beginning of the problematic
function. This changes all the offsets and makes aliasing situation
completely different.  And indeed, by adding a single nop to S_regmatch,
I get wildly different results (for now this is just 1 repeat, I will
run best-of-3 overnight):

r277511 (without this fix): 929.1s
r277515 (with this fix):    931.48s

Reply via email to