https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122722
Bug ID: 122722
Summary: Fail to SLP vectorize in-order reduction pairs
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: rguenth at gcc dot gnu.org
Target Milestone: ---
void foo (double * __restrict sums, double *a, double *b, int n)
{
for (int i = 0; i < n; ++i)
{
sums[0] = sums[0] + a[2*i];
sums[1] = sums[1] + a[2*i+1];
sums[2] = sums[2] + b[2*i];
sums[3] = sums[3] + b[2*i+1];
}
}
should be vectorizable with V2DFmode in pairs for 'a' and 'b' even when using
in-order reductions. But SLP discovery for the SLP reduction covering all
four reductions fails and we fall back to single-lane reductions which is not
profitable.
With -ffast-math we get profitable reduction but a too high vectorization
factor and required interleaving for the memory accesses.