https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122829

            Bug ID: 122829
           Summary: Optimization: Don't waste a call-preserved register if
                    a variable's value can be trivially derived
           Product: gcc
           Version: 15.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: other
          Assignee: unassigned at gcc dot gnu.org
          Reporter: Explorer09 at gmail dot com
  Target Milestone: ---

(There may be bug reports related to this one in the GCC bug tracker, but I
didn't find an exact duplicate of this issue. Also, I'm not sure whether in the
back, these 2 missed optimization cases should be treated as one issue to be
fixed, or two issues.)

I'm presenting two cases when a local variable's value can be derived trivially
and that it would be a waste of a call-preserved register (or stack space) if
the compiler preserves the temporary value of it across function calls.

* `func1`: `value` is simply retrieved from an object pointed by a pointer
(`ptr`). The `ptr` can be kept in a call-preserved register, but there's no
need to keep the temporary result of `value`.
* `func2`: `x` is simply `value * 2`. While `value` can be kept in a
call-preserved register, there is no need to preserve `x` as `value * 2` can be
trivially calculated (e.g. with an LEA instruction in x86).

(GCC is able to correctly optimize the `func3` case, where it knows `x = value
+ 3` does not need to be preserved. I show the `func3` case here for a
comparison with the `func2` case.)

```c
extern void subroutine1(int* ptr, int value);
int func1_a(int* ptr) {
    int value = ptr[0];
    subroutine1(ptr, value);
    subroutine1(ptr, value);
    return value;
}
int func1_b(int* ptr) {
    int value = ptr[0];
    subroutine1(ptr, value);
    subroutine1(ptr, ptr[0]);
    return ptr[0];
}

extern void subroutine2(int x, int y);
int func2_a(int value) {
    int x = value * 2;
    subroutine2(value, x);
    subroutine2(value, x);
    return value;
}
int func2_b(int value) {
    int x = value * 2;
    subroutine2(value, x);
    __asm__ ("" : "+r"(value));
    subroutine2(value, value * 2);
    return value;
}

#if 0
int func3_a(int value) {
    int x = value + 3;
    subroutine2(value, x);
    subroutine2(value, x + 5);
    return value;
}
#endif
```

([Compiler Explorer link](https://godbolt.org/z/v7xn1YMbK))

x86-64 GCC 15.2 with `-Os` option produces:

```assembly
func1_a:
        pushq   %rbp
        movq    %rdi, %rbp
        pushq   %rbx
        pushq   %rax
        movl    (%rdi), %ebx
        movl    %ebx, %esi
        call    subroutine1
        movl    %ebx, %esi
        movq    %rbp, %rdi
        call    subroutine1
        movl    %ebx, %eax
        popq    %rdx
        popq    %rbx
        popq    %rbp
        ret
func1_b:
        pushq   %rbx
        movl    (%rdi), %esi
        movq    %rdi, %rbx
        call    subroutine1
        movl    (%rbx), %esi
        movq    %rbx, %rdi
        call    subroutine1
        movl    (%rbx), %eax
        popq    %rbx
        ret
func2_a:
        pushq   %rbp
        leal    (%rdi,%rdi), %ebp
        pushq   %rbx
        movl    %ebp, %esi
        movl    %edi, %ebx
        pushq   %rax
        call    subroutine2
        movl    %ebp, %esi
        movl    %ebx, %edi
        call    subroutine2
        movl    %ebx, %eax
        popq    %rdx
        popq    %rbx
        popq    %rbp
        ret
func2_b:
        pushq   %rbx
        leal    (%rdi,%rdi), %esi
        movl    %edi, %ebx
        call    subroutine2
        leal    (%rbx,%rbx), %esi
        movl    %ebx, %edi
        call    subroutine2
        movl    %ebx, %eax
        popq    %rbx
        ret
```

(Note: I've also tested with AArch64 target and it also has the missed
optimization. That is, a waste of a call-preserved register.)

Reply via email to