https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120398
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |ASSIGNED
See Also| |https://gcc.gnu.org/bugzill
| |a/show_bug.cgi?id=122746
Ever confirmed|0 |1
Last reconfirmed| |2026-02-16
Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot
gnu.org
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
This could be vectorized with an in-order SLP reduction. The issue with 15/16
is that we now can use larger vectors, but inefficiently, and that x86 does not
care to compare vector costs which would have made it chose the 8 byte vector
version from 14. Mine for that part, you can see that effect with
--param ix86-vect-compare-costs=1 now:
> ./cc1 -quiet t.c -O2 -fopt-info-vec --param ix86-vect-compare-costs=1
t.c:8:12: optimized: loop vectorized using 8 byte vectors and unroll factor 1
.L3:
movq (%rdi,%rax,8), %xmm0
movq %xmm1, %xmm1
addq $1, %rax
mulps %xmm0, %xmm0
movq %xmm0, %xmm0
addps %xmm1, %xmm0
movaps %xmm0, %xmm1
cmpq %rax, %rsi
jne .L3
shufps $0xe5, %xmm0, %xmm0
addss %xmm1, %xmm0
ret
so there's a workaround for GCC 16.