https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122308
--- Comment #2 from Feng Xue <fxue at os dot amperecomputing.com> --- (In reply to Richard Biener from comment #1) > I don't see us vectorizer the outer loop: > > t.c:7:21: note: analyze in outer loop: *(&b + (sizetype) index_40 * 2) > analyze_innermost: t.c:12:18: missed: failed: evolution of base is not > affine. > t.c:7:21: missed: bad data references. > t.c:7:21: note: ***** Analysis failed with vector mode V4SI I may mis-describe the problem. On aarch64, llvm generates much compact vectorized code for the inner loop. The addition statement is directly mapped to a vector<short> add. But gcc does not. https://godbolt.org/z/Wadvdzozq
