Issue |
137894
|
Summary |
Missed autovectorization opportunity
|
Labels |
|
Assignees |
|
Reporter |
MatzeB
|
Got a report of a simple loop that should autovectorize but does not do so on aarch64 (but does on x86 / AVX512). Repro:
```
#include <stdint.h>
#include <stdlib.h>
void noAutovec(uint32_t* __restrict ip, float* __restrict src, float* __restrict dst, size_t n) {
// If you encourage the compile with the `#pragma` this does autovectorize.
// #pragma clang loop vectorize(enable)
for (size_t i=0; i<n; ++i) {
uint32_t idx = ip[i];
dst[i] = src[idx];
}
}
```
This vectorizes on x86 (`clang -march=haswell -mavx512f -O3`) but does not on aarch64 in my experiments (`clang -target aarch64-redhat-linux-gnu -march=armv9-a+sve2+fp16`).
Using `#pragma clang loop vectorize(enable)` makes vectorization work on aarch64. So this hints at the cost-model rejecting things (I assume vectorization should be beneficial when SVE is available).
(this mirror meta T222824954 )
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs