https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122315

            Bug ID: 122315
           Summary: Extraneous code generated when length is checked in
                    memory comparison
           Product: gcc
           Version: 15.2.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: matt at godbolt dot org
  Target Milestone: ---

I was investigating `std::string_view` comparisons and found unusual and
suboptimal code generation. There's some clever overlapping reads being done
(yay) but later the code generator generates surprising code:

The code is:

```
#include <string_view>
#include <cstring>

using namespace std::literals;

bool t7(std::string_view sv) { return sv == "ABCDEFG"sv; }

bool matches_1(std::size_t length, const char *s) { 
    if (length != 7) return false;
    return std::memcmp(s, "ABCDEFG", length) == 0;
}

bool matches_2(std::size_t length, const char *s) { 
    // _assume_ length is 7 here:
    return std::memcmp(s, "ABCDEFG", 7) == 0;
}
```

Here "t7" is the actual function I spotted doing odd things:

```asm
t7:
  xor eax, eax              ; return value = false
  cmp rdi, 7                ; length 7?
  je .L40                   ; if yes, check contents
  ret                       ; else, return false
.L40:
  ; compare the first 4 bytes with 1145258561 == "ABCD"
  cmp DWORD PTR [rsi], 1145258561
  je .L41                   ; if equal, jump to 41
.L37:                       ; else...
  mov eax, 1                ; set eax to 1
.L38:                       ; falls through to...
  test eax, eax             ; check eax?
  sete al                   ; eax = 1 or 0
  ret                       ; return
.L41:
  xor eax, eax              ; set eax 0
  ; Compare the last 4 bytes with 1195787588 == "DEFG"
  cmp DWORD PTR [rsi+3], 1195787588 
  ; if not equal, go to L37 (which returns 1)
  jne .L37
  ; else go to L38 (which returns 0)
  jmp .L38
```

Specifically there's some setting of eax to 0 or 1, then a fall-through/jump to
a `test eax, eax/sete al` which seems suboptimal: the compiler could reasonably
just delete the `test` and `sete` - as we know all paths that lead to L38 have
exactly 0 or 1 in `eax`.

This also happens in the simpler equivalent "matches_1" function which uses an
explicit length check followed by a memcmp.

Simplifying further, in matches_2, we see the same pattern, perhaps even more
surprisingly having `mov eax, 1` / `test eax, eax` / `sete al` in the same
basic block (!?)

I can't think why this is needed, so I surmise it's a code generation issue.
Clang generates different code, but it's more in line with what I'd expect.

CE link: https://godbolt.org/z/xEW71fvsY

Reply via email to