Issue |
141347
|
Summary |
[X86] `const << (x&7)` doesn't use `shlx` when BMI2 is available
|
Labels |
new issue
|
Assignees |
|
Reporter |
dzaima
|
These functions:
```c
void shl_u8(uint8_t* dst, uint64_t c) {
*dst = 1 << (c&7);
}
void shr_u8(uint8_t* dst, uint64_t c) {
*dst = 0xaa >> (c&7);
}
```
compiled with `-O3 -march=haswell` produce:
```asm
shl_u8:
mov rcx, rsi
and cl, 7
mov al, 1
shl al, cl
mov byte ptr [rdi], al
ret
shr_u8:
mov rcx, rsi
and cl, 7
mov al, -86
shr al, cl
mov byte ptr [rdi], al
ret
```
but they could use `shlx` & `shrx` as gcc does, e.g.:
```asm
shl_u8:
and esi, 7
mov eax, 1
shlx esi, eax, esi
mov BYTE PTR [rdi], sil
ret
```
Extra important in a loop, where clang's version ends up reloading the constant every iteration, whereas `shlx`/`shrx` can reuse one from outside the loop, ending up with clang taking 4 uops on Haswell, vs gcc - 1 uop per iteration.
https://godbolt.org/z/Yc57PsWKE
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs