https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122829
Bug ID: 122829
Summary: Optimization: Don't waste a call-preserved register if
a variable's value can be trivially derived
Product: gcc
Version: 15.2.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: other
Assignee: unassigned at gcc dot gnu.org
Reporter: Explorer09 at gmail dot com
Target Milestone: ---
(There may be bug reports related to this one in the GCC bug tracker, but I
didn't find an exact duplicate of this issue. Also, I'm not sure whether in the
back, these 2 missed optimization cases should be treated as one issue to be
fixed, or two issues.)
I'm presenting two cases when a local variable's value can be derived trivially
and that it would be a waste of a call-preserved register (or stack space) if
the compiler preserves the temporary value of it across function calls.
* `func1`: `value` is simply retrieved from an object pointed by a pointer
(`ptr`). The `ptr` can be kept in a call-preserved register, but there's no
need to keep the temporary result of `value`.
* `func2`: `x` is simply `value * 2`. While `value` can be kept in a
call-preserved register, there is no need to preserve `x` as `value * 2` can be
trivially calculated (e.g. with an LEA instruction in x86).
(GCC is able to correctly optimize the `func3` case, where it knows `x = value
+ 3` does not need to be preserved. I show the `func3` case here for a
comparison with the `func2` case.)
```c
extern void subroutine1(int* ptr, int value);
int func1_a(int* ptr) {
int value = ptr[0];
subroutine1(ptr, value);
subroutine1(ptr, value);
return value;
}
int func1_b(int* ptr) {
int value = ptr[0];
subroutine1(ptr, value);
subroutine1(ptr, ptr[0]);
return ptr[0];
}
extern void subroutine2(int x, int y);
int func2_a(int value) {
int x = value * 2;
subroutine2(value, x);
subroutine2(value, x);
return value;
}
int func2_b(int value) {
int x = value * 2;
subroutine2(value, x);
__asm__ ("" : "+r"(value));
subroutine2(value, value * 2);
return value;
}
#if 0
int func3_a(int value) {
int x = value + 3;
subroutine2(value, x);
subroutine2(value, x + 5);
return value;
}
#endif
```
([Compiler Explorer link](https://godbolt.org/z/v7xn1YMbK))
x86-64 GCC 15.2 with `-Os` option produces:
```assembly
func1_a:
pushq %rbp
movq %rdi, %rbp
pushq %rbx
pushq %rax
movl (%rdi), %ebx
movl %ebx, %esi
call subroutine1
movl %ebx, %esi
movq %rbp, %rdi
call subroutine1
movl %ebx, %eax
popq %rdx
popq %rbx
popq %rbp
ret
func1_b:
pushq %rbx
movl (%rdi), %esi
movq %rdi, %rbx
call subroutine1
movl (%rbx), %esi
movq %rbx, %rdi
call subroutine1
movl (%rbx), %eax
popq %rbx
ret
func2_a:
pushq %rbp
leal (%rdi,%rdi), %ebp
pushq %rbx
movl %ebp, %esi
movl %edi, %ebx
pushq %rax
call subroutine2
movl %ebp, %esi
movl %ebx, %edi
call subroutine2
movl %ebx, %eax
popq %rdx
popq %rbx
popq %rbp
ret
func2_b:
pushq %rbx
leal (%rdi,%rdi), %esi
movl %edi, %ebx
call subroutine2
leal (%rbx,%rbx), %esi
movl %ebx, %edi
call subroutine2
movl %ebx, %eax
popq %rbx
ret
```
(Note: I've also tested with AArch64 target and it also has the missed
optimization. That is, a waste of a call-preserved register.)