On Mon, Dec 8, 2025 at 2:41 PM Richard Biener <[email protected]> wrote:
>
> The following adjusts costing of vector construction from scalars for
> FP modes which with 387 math can reside in FP regs which need spilling
> to be reloaded to XMM. I've played on the safe side with mixed
> SSE/387 math.
>
> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
>
> OK?
>
> Thanks,
> Richard.
>
> PR target/121230
> * config/i386/i386.cc (ix86_vector_costs::add_stmt_cost):
> With FP mode and 387 math cost spill/reload.
>
> * gcc.target/i386/pr121230.c: New testcase.
> ---
> gcc/config/i386/i386.cc | 15 ++++++++++++++-
> gcc/testsuite/gcc.target/i386/pr121230.c | 16 ++++++++++++++++
> 2 files changed, 30 insertions(+), 1 deletion(-)
> create mode 100644 gcc/testsuite/gcc.target/i386/pr121230.c
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index db43045753b..ad978d7474d 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -26397,7 +26397,20 @@ ix86_vector_costs::add_stmt_cost (int count,
> vect_cost_for_stmt kind,
> (TREE_OPERAND (gimple_assign_rhs1 (def),
> 0))))))
> {
> if (fp)
> - m_num_sse_needed[where]++;
> + {
> + /* Scalar FP values residing in x87 registers need to be
> + spilled and reloaded. */
> + if (ix86_fpmath & FPMATH_387)
Perhaps you can use the IS_STACK_MODE() macro, it determines more
precisely which mode is handled in stack registers.
Uros.
> + {
> + auto mode2 = TYPE_MODE (TREE_TYPE (op));
> + int cost
> + = (ix86_cost->hard_register.fp_store[mode2 == SFmode
> + ? 0 : 1]
> + + ix86_cost->sse_load[sse_store_index (mode2)]);
> + stmt_cost += COSTS_N_INSNS (cost) / 2;
> + }
> + m_num_sse_needed[where]++;
> + }
> else
> {
> m_num_gpr_needed[where]++;
> diff --git a/gcc/testsuite/gcc.target/i386/pr121230.c
> b/gcc/testsuite/gcc.target/i386/pr121230.c
> new file mode 100644
> index 00000000000..67c9c5ccb2d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr121230.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile { target ia32 } } */
> +/* { dg-options "-O3 -march=athlon-xp -mfpmath=387
> -fexcess-precision=standard" } */
> +
> +typedef struct {
> + float a;
> + float b;
> +} f32_2;
> +
> +f32_2 add32_2(f32_2 x, f32_2 y) {
> + return (f32_2){ x.a + y.a, x.b + y.b};
> +}
> +
> +/* We do not want the vectorizer to vectorize the store and/or the
> + conversion (with IA32 we do not support V2SF add) given that spills
> + FP regs to reload them to XMM. */
> +/* { dg-final { scan-assembler-not "movss\[ \\t\]+\[0-9\]*\\\(%esp\\\),
> %xmm" } } */
> --
> 2.51.0