https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125221

            Bug ID: 125221
           Summary: x86-64 optimization: Use 32-bit popcnt instruction if
                    operand is known to fit 32 bit integer
           Product: gcc
           Version: 16.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: Explorer09 at gmail dot com
  Target Milestone: ---

x86_64 target, with `-Os -mpopcnt` compiler flags.

__builtin_popcountl() and __builtin_popcountg() should emit a 32-bit popcnt
instruction if the operand is known to fit a 32-bit unsigned int. This saves
one byte of machine code.


```c
int func1(unsigned long long x) {
    return __builtin_popcountll(x & 0x87878787);
}
int func2(unsigned long long x) {
    return __builtin_popcountll(x >> 32);
}
int func3(unsigned char x) {
    if (x >= 64) {
        __builtin_unreachable();
    }
    return __builtin_popcountll(0x87878787ULL >> x);
}
int func4(unsigned long long x) {
    return __builtin_popcountll(x % 4294967291U);
}
int func5(unsigned long long x) {
    return __builtin_popcountll(x / 4294967311U);
}
int func6(unsigned long long x) {
    if (x > 0xFFFFFFFFULL) {
        __builtin_unreachable();
    }
    return __builtin_popcountll(x);
}
```

Compiler Explorer link:
https://godbolt.org/z/ohen3WhGW

GCC seems to know that in all six example functions, the values passed to
__builtin_popcountll() would not exceed 0xFFFFFFFF, so the 32-bit popcnt
instruction can be used (rather than 64-bit).

(Clang 22.1.0 can optimize funcs 1 to 5, but misses on func6.)

Reply via email to