https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123262
Bug ID: 123262
Summary: x86 optimization: Missed combining sub and cmp on
subtract overflow patterns
Product: gcc
Version: 15.2.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: Explorer09 at gmail dot com
Target Milestone: ---
This is more of an optimization feature request than a bug, but I believe these
patterns for "subtract and overflow check" are common, and GCC missing these
could be annoying (at least to me).
Test code
```c
#define unlikely(x) __builtin_expect(!!(x), 0)
unsigned long func1_a(unsigned long x, unsigned long y) {
if (unlikely(x < y))
return 0x123;
return x - y;
}
unsigned long func1_b(unsigned long x, unsigned long y) {
if (unlikely(x - y > x))
return 0x123;
return x - y;
}
unsigned long func1_ideal(unsigned long x, unsigned long y) {
if (__builtin_usubl_overflow(x, y, &x))
return 0x123;
return x;
}
void func2_a(unsigned long x, unsigned long y) {
while (1) {
__asm__ ("" ::: "memory");
if (unlikely(x < y))
break;
x -= y;
}
}
void func2_b(unsigned long x, unsigned long y) {
while (1) {
__asm__ ("" ::: "memory");
if (unlikely(x - y > x))
break;
x -= y;
}
}
void func2_ideal(unsigned long x, unsigned long y) {
while (1) {
__asm__ ("" ::: "memory");
if (__builtin_usubl_overflow(x, y, &x))
break;
}
}
```
Compiler Explorer link: https://godbolt.org/z/9TKbxna85
x86-64 gcc 15.2 with `-Os` option:
```assembly
func1_a:
movq %rdi, %rax
movl $291, %edx
subq %rsi, %rax
cmpq %rsi, %rdi
cmovb %rdx, %rax
ret
func1_ideal:
subq %rsi, %rdi
movl $291, %eax
cmovnb %rdi, %rax
ret
func2_a:
.L14:
cmpq %rsi, %rdi
jb .L12
subq %rsi, %rdi
jmp .L14
.L12:
ret
func2_ideal:
.L20:
subq %rsi, %rdi
jnb .L20
ret
```
Note that in the test code above I specifically added the "unlikely" macro as
an optimization hint, and compiles the code with `-Os` optimization. I can
think of a few reasons GCC would choose not to combine the `sub` and `cmp`
instructions to one, therefore I suggest this optimization be performed in
`-Os` or `-Oz`, or when the (x < y) overflow conditions are marked as
"unlikely".
Clang can perform the optimization in the func1 case (but it missed on func2,
see https://github.com/llvm/llvm-project/issues/170675 ).
The func1 case may also get optimized in GCC for ARM64 (except there's a minir
issue that I've reported in bug 123009). It's not yet in x86-64.