16 regression] 4movq to 2movaps IPC performance regression on znver1 with -Og

rguenth at gcc dot gnu.org via Gcc-bugs Mon, 02 Feb 2026 03:33:14 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89557


Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|WAITING                     |NEW
   Last reconfirmed|2019-03-04 00:00:00         |2026-2-2

--- Comment #21 from Richard Biener <rguenth at gcc dot gnu.org> ---
I believe this is all about by-pices move tuning.  The problematical thing we
do is:

        movq    $0, 64(%rsp)
        movq    %rax, 72(%rsp)
        movdqa  64(%rsp), %xmm1
        movaps  %xmm1, 48(%rsp)
...
        movq    $1, 80(%rsp)
        movsd   %xmm0, 88(%rsp)
        movdqa  80(%rsp), %xmm2
        movaps  %xmm2, 48(%rsp)

the earlier stores will fail to forward to the XMM1/2 loads.  With higher
optimization we simply elide some of the copies.  The above will be bad on
any uarch.

I'll note that -O0 seems to not use XMM moves.

I'm not sure we can/should do much about this as Jakub says.

[Bug target/89557] [13/14/15/16 regression] 4*movq to 2*movaps IPC performance regression on znver1 with -Og

Reply via email to

[Bug target/89557] [13/14/15/16 regression] 4movq to 2movaps IPC performance regression on znver1 with -Og