On Tue, 24 Feb 2026, Richard Biener wrote:

> The following allows vectorizing the gcc.target/i386/pr111023*.c
> testcases again with -m32 -msse2 by ensuring we see through a cast
> when looking for memory or vector extract sources during costing
> of vector construction.
> 
> This, together with the forwprop fix fixes the regression on those testcases.
> 
> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> 
> OK if that succeeds?

While that succeeds experimenting shows that zero- and sign-extends
are not handled when moving from memory.  I think we can do zero-extends
for SImode and DImode (movd/movq) and for smaller modes via pre-zeroing
of %xmm and pinsr.  I'm leaving that for separate.  Below is a revised
patch that cleans up the various conditions and only touches the
vector extract [ -> conversion ] -> vector CTOR path to allow all
conversions.

Another option would be to not disable MMX <-> SSE conversion patterns
with -m32 or to revert another part of Honzas cost changes which regressed
those testcases (kill the * 2 multiplication).

Re-testing below patch.

OK?

Thanks,
Richard.

>From ac2a80af61d57ff686dbdbd97095e1c329c250e5 Mon Sep 17 00:00:00 2001
From: Richard Biener <[email protected]>
Date: Tue, 24 Feb 2026 09:53:00 +0100
Subject: [PATCH] target/120234 - adjust vector construction costs
To: [email protected]

The following allows vectorizing the gcc.target/i386/pr111023*.c
testcases again with -m32 -msse2 by ensuring we see through a cast
when looking for vector extract sources during costing of vector construction.

This, together with the forwprop fix fixes the regression on those testcases.

        PR target/120234
        * config/i386/i386.cc (ix86_vector_costs::add_stmt_cost):
        For constructor elements always look through a conversion.
        Rewrite load and vector extraction matching to be more obvious.
        Allow arbitrary conversions from the vector extract to elide
        costing of a gpr<->xmm move.
---
 gcc/config/i386/i386.cc | 35 +++++++++++++++++++----------------
 1 file changed, 19 insertions(+), 16 deletions(-)

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 52f82185e32..acedc73b825 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -26427,26 +26427,29 @@ ix86_vector_costs::add_stmt_cost (int count, 
vect_cost_for_stmt kind,
          TREE_VISITED (op) = 1;
          gimple *def = SSA_NAME_DEF_STMT (op);
          tree tem;
+         /* Look through a conversion.  */
          if (is_gimple_assign (def)
              && CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (def))
              && ((tem = gimple_assign_rhs1 (def)), true)
-             && TREE_CODE (tem) == SSA_NAME
-             /* A sign-change expands to nothing.  */
-             && tree_nop_conversion_p (TREE_TYPE (gimple_assign_lhs (def)),
-                                       TREE_TYPE (tem)))
+             && TREE_CODE (tem) == SSA_NAME)
            def = SSA_NAME_DEF_STMT (tem);
-         /* When the component is loaded from memory we can directly
-            move it to a vector register, otherwise we have to go
-            via a GPR or via vpinsr which involves similar cost.
-            Likewise with a BIT_FIELD_REF extracting from a vector
-            register we can hope to avoid using a GPR.  */
-         if (!is_gimple_assign (def)
-             || ((!gimple_assign_load_p (def)
-                  || (!TARGET_SSE4_1
-                      && GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (op))) == 1))
-                 && (gimple_assign_rhs_code (def) != BIT_FIELD_REF
-                     || !VECTOR_TYPE_P (TREE_TYPE
-                               (TREE_OPERAND (gimple_assign_rhs1 (def), 0))))))
+         /* When the component is loaded from memory without sign-
+            or zero-extension we can move it to a vector register and/or
+            insert it via vpinsr with a memory operand.  */
+         if (gimple_assign_load_p (def)
+             && tree_nop_conversion_p (TREE_TYPE (op),
+                                       TREE_TYPE (gimple_assign_lhs (def)))
+             && (GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (op))) > 1
+                 || TARGET_SSE4_1))
+           ;
+         /* When the component is extracted from a vector it is already
+            in a vector register.  */
+         else if (is_gimple_assign (def)
+                  && gimple_assign_rhs_code (def) == BIT_FIELD_REF
+                  && VECTOR_TYPE_P (TREE_TYPE
+                               (TREE_OPERAND (gimple_assign_rhs1 (def), 0))))
+           ;
+         else
            {
              if (fp)
                {
-- 
2.51.0

Reply via email to