16 regression] FFT computation performance regression, x86, between gcc-14 and gcc-13 on skylake platform

pinskia at gcc dot gnu.org via Gcc-bugs Tue, 10 Feb 2026 23:03:53 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115029


--- Comment #6 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
So the only difference that might make a difference between GCC 13 and 14 is
how { 4, 4, 4, 4 } and { -2113396605, -2113396605, -2113396605, -2113396605 }
are formed in the front part of stress_cpu_fft . I suspect since the loops are
small enough, the front part of stress_cpu_fft is taking enough time to make a
difference.

And it looks like depending on the micro-arch, loading from memory (L1 most
likely in this case) is slightly faster than creating the value in the GPRs and
into a the vector register.

[Bug target/115029] [14/15/16 regression] FFT computation performance regression, x86, between gcc-14 and gcc-13 on skylake platform

Reply via email to