Issue 165813
Summary [clang] Optimisation regression in trunk for x86-64 AVX2 shuffle
Labels clang
Assignees
Reporter desal
    Looks like there's an optimisation regression between clang 21.1.0 and trunk in handling AVX2 shuffle operations. It seems to only manifest when shuffling the result of a comparison, (e.g. `_mm256_cmpeq_epi32`). clang now prefers to do the shuffling in packed 128bit xmm registers and then reconstruct the 128bit ymm output. 


Test case (Compiler Explorer: https://gcc.godbolt.org/z/xGGs8hW5P)
Compiled with `-O2 -mavx2`

```
#include <immintrin.h>

__m256i foo(__m256i a, __m256i b) {
    __m256i x = _mm256_cmpeq_epi32(a, b);
    return _mm256_shuffle_epi32(x, 0b11101111); 
}
```

Clang trunk `clang version 22.0.0git (https://github.com/llvm/llvm-project.git 03e66aeb96928592ee6cd51913bf72a6e21066fc)`

```
foo(long long vector[4], long long vector[4]):
  vpcmpeqd ymm0, ymm0, ymm1
  vextracti128 xmm1, ymm0, 1
  vpackssdw xmm0, xmm0, xmm1
  vpshuflw xmm0, xmm0, 239
  vpshufhw xmm0, xmm0, 239
  vpmovsxwd ymm0, xmm0
  ret
```

Clang 21.1.0:

```
foo(long long vector[4], long long vector[4]):
  vpcmpeqd ymm0, ymm0, ymm1
  vpshufd ymm0, ymm0, 239
  ret
```
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to