https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92007
--- Comment #27 from Ilya Leoshkevich <iii at linux dot ibm.com> --- With -DSPEC_CPU -DNDEBUG -DPERL_CORE -O3 -save-temps=obj -fopt-info-vec-optimized -DSPEC_CPU_LP64 -DSPEC_CPU_LINUX_X64 -fgnu89-inline on gcc113 I can see 2% slowdown: r277511 (without this fix): 880.09s r277515 (with this fix): 897.85s The function that degraded the most is indeed S_regmatch: $ perf diff perf-9760321.data perf-44b2b4c.data 32.24% exe [.] S_regmatch 8.92% exe [.] S_find_byclass.isra.0 6.80% +0.28% libc-2.19.so [.] 0x000000000007dec0 5.20% exe [.] S_regtry However, the "shape" of S_regmatch did not change, that is, when all offsets and register numbers are replaced with "x" in the objdump output, the old and the new versions are identical. This hints at some microarchitectural effect - aliasing in the branch predictor maybe? From my perspective, this happens too often, so I use the following test to rule this out: just add a nop at the beginning of the problematic function. This changes all the offsets and makes aliasing situation completely different. And indeed, by adding a single nop to S_regmatch, I get wildly different results (for now this is just 1 repeat, I will run best-of-3 overnight): r277511 (without this fix): 929.1s r277515 (with this fix): 931.48s