Issue 137894
Summary Missed autovectorization opportunity
Labels
Assignees
Reporter MatzeB
    Got a report of a simple loop that should autovectorize but does not do so on aarch64 (but does on x86 / AVX512). Repro:

```
#include <stdint.h>
#include <stdlib.h>

void noAutovec(uint32_t* __restrict ip, float* __restrict src, float* __restrict dst, size_t n) {
   //  If you encourage the compile with the `#pragma` this does autovectorize.
   // #pragma clang loop vectorize(enable)
    for (size_t i=0; i<n; ++i) {
 uint32_t idx = ip[i];
      dst[i] = src[idx];
    }
}
```

This vectorizes on x86 (`clang -march=haswell -mavx512f -O3`) but does not on aarch64 in my experiments (`clang -target aarch64-redhat-linux-gnu -march=armv9-a+sve2+fp16`).

Using `#pragma clang loop vectorize(enable)` makes vectorization work on aarch64. So this hints at the cost-model rejecting things (I assume vectorization should be beneficial when SVE is available).

(this mirror meta T222824954 )
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to