| Issue |
113965
|
| Summary |
[x86-64] Avoid usage of multi-uop CMOVBE/CMOVNBE
|
| Labels |
new issue
|
| Assignees |
|
| Reporter |
daniel-zabawa
|
The CMOVBE/CMOVNBE instructions generate 2 uops and have a throughput of 1 for P-cores. Other CMOVs are a single uop with a throughput of 2.
The following case shows the backend generating the more expensive CMOVBE/CMOVA instructions:
```
// file f.c
int f(int x) {
if (x < 2)
return x;
long long int l = 1;
long long int u = x;
do {
long long int m = (l + u) >> 1;
if (m*m > x) u=m; else l=m;
} while (l+1 < u);
return (int)l;
}
```
Compiling the above with trunk as `clang -O2 -march=core-avx2 -S f.c` generates:
```
f(int):
mov eax, edi
cmp edi, 2
jl .LBB0_3
mov ecx, eax
mov eax, 1
mov rdx, rcx
.LBB0_2:
lea rsi, [rdx + rax]
sar rsi
mov rdi, rsi
imul rdi, rsi
cmp rdi, rcx
cmovbe rax, rsi
cmova rdx, rsi
lea rsi, [rax + 1]
cmp rsi, rdx
jl .LBB0_2
.LBB0_3:
ret
```
The `cmovge` and `cmovl` instructions should be preferred to these where possible.
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs