Issue 141347
Summary [X86] `const << (x&7)` doesn't use `shlx` when BMI2 is available
Labels new issue
Assignees
Reporter dzaima
    These functions:
```c
void shl_u8(uint8_t* dst, uint64_t c) {
    *dst = 1 << (c&7);
}
void shr_u8(uint8_t* dst, uint64_t c) {
    *dst = 0xaa >> (c&7);
}
```
compiled with `-O3 -march=haswell` produce:
```asm
shl_u8:
        mov     rcx, rsi
 and     cl, 7
        mov     al, 1
        shl     al, cl
        mov byte ptr [rdi], al
        ret

shr_u8:
        mov     rcx, rsi
 and     cl, 7
        mov     al, -86
        shr     al, cl
        mov byte ptr [rdi], al
        ret
```
but they could use `shlx` & `shrx` as gcc does, e.g.:
```asm
shl_u8:
        and     esi, 7
        mov eax, 1
        shlx    esi, eax, esi
        mov     BYTE PTR [rdi], sil
 ret
```
Extra important in a loop, where clang's version ends up reloading the constant every iteration, whereas `shlx`/`shrx` can reuse one from outside the loop, ending up with clang taking 4 uops on Haswell, vs gcc - 1 uop per iteration.

https://godbolt.org/z/Yc57PsWKE
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to