https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122315
Bug ID: 122315
Summary: Extraneous code generated when length is checked in
memory comparison
Product: gcc
Version: 15.2.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: matt at godbolt dot org
Target Milestone: ---
I was investigating `std::string_view` comparisons and found unusual and
suboptimal code generation. There's some clever overlapping reads being done
(yay) but later the code generator generates surprising code:
The code is:
```
#include <string_view>
#include <cstring>
using namespace std::literals;
bool t7(std::string_view sv) { return sv == "ABCDEFG"sv; }
bool matches_1(std::size_t length, const char *s) {
if (length != 7) return false;
return std::memcmp(s, "ABCDEFG", length) == 0;
}
bool matches_2(std::size_t length, const char *s) {
// _assume_ length is 7 here:
return std::memcmp(s, "ABCDEFG", 7) == 0;
}
```
Here "t7" is the actual function I spotted doing odd things:
```asm
t7:
xor eax, eax ; return value = false
cmp rdi, 7 ; length 7?
je .L40 ; if yes, check contents
ret ; else, return false
.L40:
; compare the first 4 bytes with 1145258561 == "ABCD"
cmp DWORD PTR [rsi], 1145258561
je .L41 ; if equal, jump to 41
.L37: ; else...
mov eax, 1 ; set eax to 1
.L38: ; falls through to...
test eax, eax ; check eax?
sete al ; eax = 1 or 0
ret ; return
.L41:
xor eax, eax ; set eax 0
; Compare the last 4 bytes with 1195787588 == "DEFG"
cmp DWORD PTR [rsi+3], 1195787588
; if not equal, go to L37 (which returns 1)
jne .L37
; else go to L38 (which returns 0)
jmp .L38
```
Specifically there's some setting of eax to 0 or 1, then a fall-through/jump to
a `test eax, eax/sete al` which seems suboptimal: the compiler could reasonably
just delete the `test` and `sete` - as we know all paths that lead to L38 have
exactly 0 or 1 in `eax`.
This also happens in the simpler equivalent "matches_1" function which uses an
explicit length check followed by a memcmp.
Simplifying further, in matches_2, we see the same pattern, perhaps even more
surprisingly having `mov eax, 1` / `test eax, eax` / `sete al` in the same
basic block (!?)
I can't think why this is needed, so I surmise it's a code generation issue.
Clang generates different code, but it's more in line with what I'd expect.
CE link: https://godbolt.org/z/xEW71fvsY