16 regression] x86: Inefficient code generation with -m3dnow -msse since GCC 12 since r12-7612

rguenth at gcc dot gnu.org via Gcc-bugs Wed, 03 Dec 2025 06:22:35 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121230


Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |crazylht at gmail dot com

--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
This is basically

          /* When the component is loaded from memory we can directly
             move it to a vector register, otherwise we have to go
             via a GPR or via vpinsr which involves similar cost.
             Likewise with a BIT_FIELD_REF extracting from a vector
             register we can hope to avoid using a GPR.  */
          if (!is_gimple_assign (def) 
              || ((!gimple_assign_load_p (def)
                   || (!TARGET_SSE4_1
                       && GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (op))) == 1))
                  && (gimple_assign_rhs_code (def) != BIT_FIELD_REF
                      || !VECTOR_TYPE_P (TREE_TYPE
                                (TREE_OPERAND (gimple_assign_rhs1 (def),
0))))))
            {
              if (fp)
                m_num_sse_needed[where]++;
              else
                {
                  m_num_gpr_needed[where]++;

                  int cost = COSTS_N_INSNS (ix86_cost->integer_to_sse) / 2;


where we make a move from FP stack reg to FP XMM reg free, assuming that
FP is done in XMM regs.  The def stmt here is an add, but without
-mfpmath=sse we have to spill to the stack and re-load to XMM.  There's
no special cost for this like integer_to_sse.  Also I'm not sure on the
exact TARGET_* flag to check for -mfpmath=sse (I guess sse,x87 should be
handled conservatively).  Changing the above if (fp) to if (0), thus
considering integer-to-sse disables vectorization w/o -mfmath=sse.

Somebody with more target knowledge around -mfpmath should put costing
into the if (fp) path accounting for FP REG to stack + XMM load from stack.

[Bug target/121230] [13/14/15/16 regression] x86: Inefficient code generation with -m3dnow -msse since GCC 12 since r12-7612

Reply via email to