https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122308
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Last reconfirmed| |2025-10-17
Summary|Inefficient vectorization |Inefficient vectorization
|on inner loop |on inner loop due to
| |unroll-and-jam
Ever confirmed|0 |1
Status|UNCONFIRMED |NEW
--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Feng Xue from comment #2)
> (In reply to Richard Biener from comment #1)
> > I don't see us vectorizer the outer loop:
> >
> > t.c:7:21: note: analyze in outer loop: *(&b + (sizetype) index_40 * 2)
> > analyze_innermost: t.c:12:18: missed: failed: evolution of base is not
> > affine.
> > t.c:7:21: missed: bad data references.
> > t.c:7:21: note: ***** Analysis failed with vector mode V4SI
>
> I may mis-describe the problem. On aarch64, llvm generates much compact
> vectorized code for the inner loop. The addition statement is directly
> mapped to a vector<short> add. But gcc does not.
>
> https://godbolt.org/z/Wadvdzozq
I think it's fine, the issue seems to be that we apply unroll-and-jam
and this version of the loop confuses us somehow.
for (int j = 0; j < 1024; ++j)
{
a[j] += b[index + j];
a[j] += b[index2 + j];
}
is fine. On x86 we don't vectorize the unrolled-and-jammed version at all:
t.c:11:25: note: ==> examining statement: _48 = b[_47];
t.c:11:25: missed: unsupported vector types for emulated gather.
t.c:12:18: missed: not vectorized: relevant stmt not supported: _48 = b[_47];
t.c:11:25: note: unsupported SLP instance starting from: a[j_23] = _51;
t.c:11:25: missed: unsupported SLP instances
so it also shows we fail to analyze one of the data-refs, resorting back
to gathers.