https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125221
Bug ID: 125221
Summary: x86-64 optimization: Use 32-bit popcnt instruction if
operand is known to fit 32 bit integer
Product: gcc
Version: 16.1.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: Explorer09 at gmail dot com
Target Milestone: ---
x86_64 target, with `-Os -mpopcnt` compiler flags.
__builtin_popcountl() and __builtin_popcountg() should emit a 32-bit popcnt
instruction if the operand is known to fit a 32-bit unsigned int. This saves
one byte of machine code.
```c
int func1(unsigned long long x) {
return __builtin_popcountll(x & 0x87878787);
}
int func2(unsigned long long x) {
return __builtin_popcountll(x >> 32);
}
int func3(unsigned char x) {
if (x >= 64) {
__builtin_unreachable();
}
return __builtin_popcountll(0x87878787ULL >> x);
}
int func4(unsigned long long x) {
return __builtin_popcountll(x % 4294967291U);
}
int func5(unsigned long long x) {
return __builtin_popcountll(x / 4294967311U);
}
int func6(unsigned long long x) {
if (x > 0xFFFFFFFFULL) {
__builtin_unreachable();
}
return __builtin_popcountll(x);
}
```
Compiler Explorer link:
https://godbolt.org/z/ohen3WhGW
GCC seems to know that in all six example functions, the values passed to
__builtin_popcountll() would not exceed 0xFFFFFFFF, so the 32-bit popcnt
instruction can be used (rather than 64-bit).
(Clang 22.1.0 can optimize funcs 1 to 5, but misses on func6.)