| Issue |
165813
|
| Summary |
[clang] Optimisation regression in trunk for x86-64 AVX2 shuffle
|
| Labels |
clang
|
| Assignees |
|
| Reporter |
desal
|
Looks like there's an optimisation regression between clang 21.1.0 and trunk in handling AVX2 shuffle operations. It seems to only manifest when shuffling the result of a comparison, (e.g. `_mm256_cmpeq_epi32`). clang now prefers to do the shuffling in packed 128bit xmm registers and then reconstruct the 128bit ymm output.
Test case (Compiler Explorer: https://gcc.godbolt.org/z/xGGs8hW5P)
Compiled with `-O2 -mavx2`
```
#include <immintrin.h>
__m256i foo(__m256i a, __m256i b) {
__m256i x = _mm256_cmpeq_epi32(a, b);
return _mm256_shuffle_epi32(x, 0b11101111);
}
```
Clang trunk `clang version 22.0.0git (https://github.com/llvm/llvm-project.git 03e66aeb96928592ee6cd51913bf72a6e21066fc)`
```
foo(long long vector[4], long long vector[4]):
vpcmpeqd ymm0, ymm0, ymm1
vextracti128 xmm1, ymm0, 1
vpackssdw xmm0, xmm0, xmm1
vpshuflw xmm0, xmm0, 239
vpshufhw xmm0, xmm0, 239
vpmovsxwd ymm0, xmm0
ret
```
Clang 21.1.0:
```
foo(long long vector[4], long long vector[4]):
vpcmpeqd ymm0, ymm0, ymm1
vpshufd ymm0, ymm0, 239
ret
```
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs