Issue |
160710
|
Summary |
Suboptimal codegen — redundant zero-extend instruction in this simple C++ function
|
Labels |
new issue
|
Assignees |
|
Reporter |
00001H
|
Take this simple C++ function, which counts the number of holes in the numbers in the string: (0,4,6,9 have 1 hole, 8 has two holes)
```cpp
#include<cstdint>
#include<array>
using ui = std::uint_fast32_t;
using uc = std::uint_fast8_t;
ui countholes(const char* s){
constexpr static std::array<ui,10> pre_table{1,0,0,0,1,0,1,0,2,1};
uc c;
ui tot = 0;
while(true){
c = uc(*s++);
if(c<uc('0'))break;
tot += pre_table[c-uc('0')];
}
return tot;
}
```
clang [generates](https://godbolt.org/z/zxdezbE5K):
```asm
countholes:
movzx ecx, byte ptr [rdi]
cmp cl, 48
jae .LBB0_3
xor eax, eax
ret
.LBB0_3:
inc rdi
xor eax, eax
lea rdx, [rip + countholes::pre_table]
.LBB0_4:
movzx ecx, cl
add ecx, -48
add rax, qword ptr [rdx + 8*rcx]
movzx ecx, byte ptr [rdi]
inc rdi
cmp cl, 47
ja .LBB0_4
ret
countholes::pre_table:
.quad 1
.quad 0
.quad 0
.quad 0
.quad 1
.quad 0
.quad 1
.quad 0
.quad 2
.quad 1
```
Pay attention to the `movzx` instruction right after `.LBB0_4`. It zero extends `cl` into the `ecx` register. However, the only places where the C register is set or changed is with a `movzx` from a byte, meaning that bits 8-63 of the C register must be zero for the entirety of the function after the first instruction executes. That means the instruction in question is useless and can be elided. (I do not believe there would otherwise be partial register stalls because the only usage of `cl` is with `cmp`, which is only a read.)
[This godbolt shows that the code act identically with or without the instruction.](https://godbolt.org/z/WTccPbMab)
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs