On Mon, Dec 8, 2025 at 2:41 PM Richard Biener <[email protected]> wrote:
>
> The following adjusts costing of vector construction from scalars for
> FP modes which with 387 math can reside in FP regs which need spilling
> to be reloaded to XMM.  I've played on the safe side with mixed
> SSE/387 math.
>
> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
>
> OK?
>
> Thanks,
> Richard.
>
>         PR target/121230
>         * config/i386/i386.cc (ix86_vector_costs::add_stmt_cost):
>         With FP mode and 387 math cost spill/reload.
>
>         * gcc.target/i386/pr121230.c: New testcase.
> ---
>  gcc/config/i386/i386.cc                  | 15 ++++++++++++++-
>  gcc/testsuite/gcc.target/i386/pr121230.c | 16 ++++++++++++++++
>  2 files changed, 30 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr121230.c
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index db43045753b..ad978d7474d 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -26397,7 +26397,20 @@ ix86_vector_costs::add_stmt_cost (int count, 
> vect_cost_for_stmt kind,
>                                 (TREE_OPERAND (gimple_assign_rhs1 (def), 
> 0))))))
>             {
>               if (fp)
> -               m_num_sse_needed[where]++;
> +               {
> +                 /* Scalar FP values residing in x87 registers need to be
> +                    spilled and reloaded.  */
> +                 if (ix86_fpmath & FPMATH_387)

Perhaps you can use the IS_STACK_MODE() macro, it determines more
precisely which mode is handled in stack registers.

Uros.

> +                   {
> +                     auto mode2 = TYPE_MODE (TREE_TYPE (op));
> +                     int cost
> +                       = (ix86_cost->hard_register.fp_store[mode2 == SFmode
> +                                                            ? 0 : 1]
> +                          + ix86_cost->sse_load[sse_store_index (mode2)]);
> +                     stmt_cost += COSTS_N_INSNS (cost) / 2;
> +                   }
> +                 m_num_sse_needed[where]++;
> +               }
>               else
>                 {
>                   m_num_gpr_needed[where]++;
> diff --git a/gcc/testsuite/gcc.target/i386/pr121230.c 
> b/gcc/testsuite/gcc.target/i386/pr121230.c
> new file mode 100644
> index 00000000000..67c9c5ccb2d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr121230.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile { target ia32 } } */
> +/* { dg-options "-O3 -march=athlon-xp -mfpmath=387 
> -fexcess-precision=standard" } */
> +
> +typedef struct {
> +    float a;
> +    float b;
> +} f32_2;
> +
> +f32_2 add32_2(f32_2 x, f32_2 y) {
> +    return (f32_2){ x.a + y.a, x.b + y.b};
> +}
> +
> +/* We do not want the vectorizer to vectorize the store and/or the
> +   conversion (with IA32 we do not support V2SF add) given that spills
> +   FP regs to reload them to XMM.  */
> +/* { dg-final { scan-assembler-not "movss\[ \\t\]+\[0-9\]*\\\(%esp\\\), 
> %xmm" } } */
> --
> 2.51.0

Reply via email to