https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115029
Roger Sayle <roger at nextmovesoftware dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |roger at nextmovesoftware dot
com
--- Comment #7 from Roger Sayle <roger at nextmovesoftware dot com> ---
An alternate explanation might be that (according the attached disassembly
files) __muldc was 696 bytes long in GCC 13, and 774 bytes long in GCC 14.
Alas there are a large number of differences between the two .asm files; the
latter is 171 lines longer! The code for complex multiplication in libgcc
hasn't changed but the "hot" path tests for "isnan(x) && isnan(y)", which in
GCC 13 was implemented as:
2b04: 66 0f 2e ed ucomisd %xmm5,%xmm5
2b08: 0f 9a c0 setp %al
2b0b: 66 0f 2e c9 ucomisd %xmm1,%xmm1
2b0f: 0f 9a c2 setp %dl
2b12: 20 d0 and %dl,%al
2b14: 0f 84 be 01 00 00 je 2cd8 <__muldc3+0x218>
which presumably allows the two ucomisd instructions to be run in parallel,
but in GCC 14 this is replaced with:
2dd1: 66 0f 2e ed ucomisd %xmm5,%xmm5
2dd5: 0f 8b 3a 01 00 00 jnp 2f15 <__muldc3+0x185>
2ddb: 66 0f 2e c9 ucomisd %xmm1,%xmm1
2ddf: 0f 8b 30 01 00 00 jnp 2f15 <__muldc3+0x185>
which has two conditional jumps in the same "paragraph", which might play
poorly with branch prediction.
As a control it might be useful to run a new object file linked against an old
libgcc (and/or an old object file linked against a new libgcc), to eliminate
that the problem isn't invisible to godbolt, but a "hidden" regression in the
complex multiplication library call which is the hotest part of the FFT stress
test.
I'll continue investigating, it still might be sse/avx constant
materialization...