pr111023-2.c

rguenth at gcc dot gnu.org via Gcc-bugs Fri, 06 Feb 2026 05:35:11 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120234


--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
r16-869-ge3d3d6d7d2c8ab has added

+                 /* For integer construction, the number of actual GPR -> XMM
+                    moves will be somewhere between 0 and n.
+                    We do not have very good idea about actual number, since
+                    the source may be a constant, memory or a chain of
+                    instructions that will be later converted by
+                    scalar-to-vector pass.  */
+                 if (kind == vec_construct
+                     && GET_MODE_BITSIZE (mode) == 256)
+                   cost *= 2;
+                 else if (kind == vec_construct
+                          && GET_MODE_BITSIZE (mode) == 512)
+                   cost *= 3;

which doesn't make much sense - we are tracking an upper bound
of number of GPR to XMM moves.  Multiplying by 2 or 3 for each unique
scalar element source doesn't make any sense.

Of course for this bug we're dealing with SSE vector sizes, not affected by
this.  I cannot find a mail to gcc-patches for this change.

The overall change made the core cost for CTORs cheaper (from addss to sse_op),
so maybe this is to compensate.

But we're also using

  COSTS_N_INSNS (ix86_cost->integer_to_sse) / 2

instead of ix86_cost->sse_to_integer cost which duplicates costs, but
we do not apply that for ix86_cost->sse_op.  The integer to SSE cost is
then effectively 12 here, the same cost as a load from memory,
while the insertion cost is 4.

I'll note it's all moot for the testcase since eventually we look for
what forwprop does later but the backend refuses the vectorizer to do.

I think the costing works as intended, but the actual numbers seem quite off
(but the r16-869-ge3d3d6d7d2c8ab commit indicated already that they are).

It is IMO quite difficult to improve the high-level things iff the
low-level is off (the target costing).  I'll propose partial reversion
of r16-869-ge3d3d6d7d2c8ab, but that doesn't help here.

Best would be to not kneecap the vectorizer with the TARGET_MMX_WITH_SSE
guardrail here.  Or decide we simply don't care about -m32 and XFAIL there.

[Bug target/120234] [16 Regression] FAIL: gcc.target/i386/pr111023-2.c

Reply via email to