https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122791

            Bug ID: 122791
           Summary: Missed optimization with a loop that multiplies
                    counter by 2 until overflow
           Product: gcc
           Version: 15.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: other
          Assignee: unassigned at gcc dot gnu.org
          Reporter: Explorer09 at gmail dot com
  Target Milestone: ---

(I reported this issue first in Clang
(https://github.com/llvm/llvm-project/issues/168580)
but it seems that GCC also missed the optimization.)

```c
#include <limits.h>
extern void subroutine(unsigned long x);
void func1a(void) {
    unsigned long x = 1;
    while (1) {
        __asm__ ("" : "+r"(x));
        subroutine(x);
        if (x > ULONG_MAX / 2)
            break;
        x *= 2;
    }
}
void func1b(void) {
    unsigned long x = 1;
    while (1) {
        subroutine(x);
        if (__builtin_add_overflow(x, x, &x))
            break;
    }
}
#if 0
void func1c(void) {
    unsigned long x = 1;
    while (1) {
        subroutine(x);
        if (__builtin_mul_overflow(x, 2UL, &x))
            break;
    }
}
void func1a_orig(void) {
    unsigned long x = 1;
    while (1) {
        subroutine(x);
        if (x > ULONG_MAX / 2)
            break;
        x *= 2;
    }
}
#endif
```

x86-64 gcc 15.2 with `-Os` option produces
(https://godbolt.org/z/nxbEMcz4Y):

```assembly
func1a:
        pushq   %rbx
        movl    $1, %ebx
.L3:
        movq    %rbx, %rdi
        call    subroutine
        testq   %rbx, %rbx
        js      .L1
        addq    %rbx, %rbx
        jmp     .L3
.L1:
        popq    %rbx
        ret
func1b:
        pushq   %rbx
        movl    $1, %ebx
.L9:
        movq    %rbx, %rdi
        call    subroutine
        addq    %rbx, %rbx
        jnc     .L9
        popq    %rbx
        ret
```

While the conditional `(x > ULONG_MAX / 2)` can be converted into a "test if
sign bit is set" check, it can miss that x would multiply by 2 afterward, so
the code can be smaller by checking the carry bit after addition.

func1a(), func1b() and func1c() in the example are all equivalent. I expect
they compile to the same code.

I tested this issue both with x86-64 and ARM64 targets.

Reply via email to