https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122308

--- Comment #2 from Feng Xue <fxue at os dot amperecomputing.com> ---
(In reply to Richard Biener from comment #1)
> I don't see us vectorizer the outer loop:
> 
> t.c:7:21: note:   analyze in outer loop: *(&b + (sizetype) index_40 * 2)
> analyze_innermost: t.c:12:18: missed:   failed: evolution of base is not
> affine.
> t.c:7:21: missed:  bad data references.
> t.c:7:21: note:  ***** Analysis failed with vector mode V4SI

I may mis-describe the problem. On aarch64, llvm generates much compact
vectorized code for the inner loop. The addition statement is directly mapped
to a vector<short> add. But gcc does not.

https://godbolt.org/z/Wadvdzozq

Reply via email to