https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121456

            Bug ID: 121456
           Summary: GCC doesn't fully utilize registers with known values
                    when making 'mov' instructions
           Product: gcc
           Version: 15.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: other
          Assignee: unassigned at gcc dot gnu.org
          Reporter: Explorer09 at gmail dot com
  Target Milestone: ---

This issue is easier to demonstrate in '-Os' optimization, but sometimes
happens in '-O2' as well.

Test code:

```c
#include <stdint.h>

uint64_t func1a(uint64_t a, uint64_t b) {
    return a == 10 ? a : b;
}

uint64_t func1b(uint64_t a, uint64_t b) {
    return a == 10 ? 10 : b;
}

uint64_t func2a(uint64_t a, uint64_t b) {
    return a == 0x1234abcd ? a : b;
}

uint64_t func2b(uint64_t a, uint64_t b) {
    return a == 0x1234abcd ? 0x1234abcd : b;
}
```

Note that func1a and func1b are equivalent, and func2a and func2b are
equivalent.

(All tests below are done in Compiler Explorer)

### Test 1, for ARM64 target

The ideal result is what's generated by Clang 20.1.0 (with '-O2' flag):

```assembly
func1a:
        cmp     x0, #10
        csel    x0, x0, x1, eq
        ret

func2a:
        mov     w8, #43981
        movk    w8, #4660, lsl #16
        cmp     x0, x8
        csel    x0, x0, x1, eq
        ret
```

In gcc 15.1 (with '-O2' flag) this is generated instead:

```assembly
func1a:
        cmp     x0, 10
        mov     x0, 10
        csel    x0, x1, x0, ne
        ret
// func1b assembly is same as func1a

func2a:
        mov     x2, 43981
        movk    x2, 0x1234, lsl 16
        cmp     x0, x2
        csel    x0, x1, x0, ne
        ret
func2b:
        mov     x2, 43981
        movk    x2, 0x1234, lsl 16
        cmp     x0, x2
        csel    x0, x0, x1, eq
        ret
```

Note that (a) there's an unneeded 'mov' instruction in func1a, and (b) although
func2a and func2b have the same code size, their use of 'csel' instruction is
not identical (I suspect that means the code is not canonicalized to be the
same).

When compiled with gcc 15.1 with '-Os' flag, func1b got the ideal result, but
then the code becomes different from func1a. func1a has an unneeded 'mov'
instruction.

### Test 2, for x86-64 target

The ideal result is what's generated by Clang 20.1.0 (with '-O2' flag):

```assembly
.intel_syntax
func1a:
        mov     rax, rsi
        cmp     rdi, 10
        cmove   rax, rdi
        ret

func2a:
        mov     rax, rsi
        cmp     rdi, 305441741
        cmove   rax, rdi
        ret
```

In gcc 15.1 (with '-Os' flag) this is generated instead:

```assembly
func1a:
        cmp     rdi, 10
        mov     eax, 10
        cmovne  rax, rsi
        ret
func1b:
        cmp     rdi, 10
        mov     rax, rsi
        cmove   rax, rdi
        ret
func2a:
        cmp     rdi, 305441741
        mov     eax, 305441741
        cmovne  rax, rsi
        ret
func2b:
        cmp     rdi, 305441741
        mov     rax, rsi
        cmove   rax, rdi
        ret
```

(a) GCC converts a register 'mov' instruction into a immediate-operand 'mov'
for func1a and func2a cases (which suggests GCC can know that the variable 'a'
has a fixed value after the equality check), however,
(b) GCC misses that the 'a' variants of both functions can convert to the 'b'
variants. The 'b' variants have the ideal size I want for '-Os'.

So for both tests for the two architectures, GCC didn't fully utilize registers
when they have known, fixed values when planning register move instructions.

Reply via email to