On Tue, Feb 24, 2026 at 10:24 PM Richard Biener <[email protected]> wrote:
>
> On Tue, 24 Feb 2026, Richard Biener wrote:
>
> > The following allows vectorizing the gcc.target/i386/pr111023*.c
> > testcases again with -m32 -msse2 by ensuring we see through a cast
> > when looking for memory or vector extract sources during costing
> > of vector construction.
> >
> > This, together with the forwprop fix fixes the regression on those 
> > testcases.
> >
> > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> >
> > OK if that succeeds?
>
> While that succeeds experimenting shows that zero- and sign-extends
> are not handled when moving from memory.  I think we can do zero-extends
> for SImode and DImode (movd/movq) and for smaller modes via pre-zeroing
> of %xmm and pinsr.  I'm leaving that for separate.  Below is a revised
> patch that cleans up the various conditions and only touches the
> vector extract [ -> conversion ] -> vector CTOR path to allow all
> conversions.
>
> Another option would be to not disable MMX <-> SSE conversion patterns
> with -m32 or to revert another part of Honzas cost changes which regressed
> those testcases (kill the * 2 multiplication).
>
> Re-testing below patch.
>
> OK?

LGTM.

>
> Thanks,
> Richard.
>
> From ac2a80af61d57ff686dbdbd97095e1c329c250e5 Mon Sep 17 00:00:00 2001
> From: Richard Biener <[email protected]>
> Date: Tue, 24 Feb 2026 09:53:00 +0100
> Subject: [PATCH] target/120234 - adjust vector construction costs
> To: [email protected]
>
> The following allows vectorizing the gcc.target/i386/pr111023*.c
> testcases again with -m32 -msse2 by ensuring we see through a cast
> when looking for vector extract sources during costing of vector construction.
>
> This, together with the forwprop fix fixes the regression on those testcases.
>
>         PR target/120234
>         * config/i386/i386.cc (ix86_vector_costs::add_stmt_cost):
>         For constructor elements always look through a conversion.
>         Rewrite load and vector extraction matching to be more obvious.
>         Allow arbitrary conversions from the vector extract to elide
>         costing of a gpr<->xmm move.
> ---
>  gcc/config/i386/i386.cc | 35 +++++++++++++++++++----------------
>  1 file changed, 19 insertions(+), 16 deletions(-)
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 52f82185e32..acedc73b825 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -26427,26 +26427,29 @@ ix86_vector_costs::add_stmt_cost (int count, 
> vect_cost_for_stmt kind,
>           TREE_VISITED (op) = 1;
>           gimple *def = SSA_NAME_DEF_STMT (op);
>           tree tem;
> +         /* Look through a conversion.  */
>           if (is_gimple_assign (def)
>               && CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (def))
>               && ((tem = gimple_assign_rhs1 (def)), true)
> -             && TREE_CODE (tem) == SSA_NAME
> -             /* A sign-change expands to nothing.  */
> -             && tree_nop_conversion_p (TREE_TYPE (gimple_assign_lhs (def)),
> -                                       TREE_TYPE (tem)))
> +             && TREE_CODE (tem) == SSA_NAME)
>             def = SSA_NAME_DEF_STMT (tem);
> -         /* When the component is loaded from memory we can directly
> -            move it to a vector register, otherwise we have to go
> -            via a GPR or via vpinsr which involves similar cost.
> -            Likewise with a BIT_FIELD_REF extracting from a vector
> -            register we can hope to avoid using a GPR.  */
> -         if (!is_gimple_assign (def)
> -             || ((!gimple_assign_load_p (def)
> -                  || (!TARGET_SSE4_1
> -                      && GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (op))) == 1))
> -                 && (gimple_assign_rhs_code (def) != BIT_FIELD_REF
> -                     || !VECTOR_TYPE_P (TREE_TYPE
> -                               (TREE_OPERAND (gimple_assign_rhs1 (def), 
> 0))))))
> +         /* When the component is loaded from memory without sign-
> +            or zero-extension we can move it to a vector register and/or
> +            insert it via vpinsr with a memory operand.  */
> +         if (gimple_assign_load_p (def)
> +             && tree_nop_conversion_p (TREE_TYPE (op),
> +                                       TREE_TYPE (gimple_assign_lhs (def)))
> +             && (GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (op))) > 1
> +                 || TARGET_SSE4_1))
> +           ;
> +         /* When the component is extracted from a vector it is already
> +            in a vector register.  */
> +         else if (is_gimple_assign (def)
> +                  && gimple_assign_rhs_code (def) == BIT_FIELD_REF
> +                  && VECTOR_TYPE_P (TREE_TYPE
> +                               (TREE_OPERAND (gimple_assign_rhs1 (def), 0))))
> +           ;
> +         else
>             {
>               if (fp)
>                 {
> --
> 2.51.0
>


-- 
BR,
Hongtao

Reply via email to