https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125876

            Bug ID: 125876
           Summary: [13/14/15/16/17 Regression] x86: register-source
                    vmovddup spilled to the stack instead of using the
                    register form
           Product: gcc
           Version: 13.1.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: Ashwin.Godbole at amd dot com
                CC: Sarvesh.Chandra at amd dot com, vekumar at gcc dot gnu.org
  Target Milestone: ---
              Host: x86_64-linux-gnu
            Target: x86_64-linux-gnu

Created attachment 64770
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=64770&action=edit
Fix (patchfile)

Since GCC 13, a 256/512 bit movddup (_mm512_movedup_pd / _mm256_movedup_pd,
whose source is already in a register is spilled to the stack and reloaded
via the memory form of vmovddup, instead of the register form 
"vmovddup %zmm,%zmm" / "%ymm,%ymm".

GCC 12.x is unaffected.

$ gcc -O2 -mavx512f -S test.c

f512 actual  : vmovapd %zmm0,-64(%rsp); vmovddup -64(%rsp),%zmm0   (+ frame)
f512 expected: vmovddup %zmm0,%zmm0                                (clang, GCC
12)

Bisected to r13-3587-g4acc4c2be84 "Fix incorrect digit constraint
[PR target/107057]"; the parent r13-3586-g5c5ef2f9ab5 is good. That commit
turned avx512f_movddup512 / avx_movddup256 into define_insns whose operand 1
uses a memory-only "m" constraint, while the predicate is nonimmediate_operand,
so LRA spills a register source to satisfy it. The sibling unpcklpd patterns
use "vm".

Tested on x86_64-linux-gnu (AMD Zen 4).

Reply via email to