https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124838
Bug ID: 124838
Summary: __builtin_clzl(0) overoptimization even on targets
with native lzcnt instruction
Product: gcc
Version: 15.2.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: zoltan at hidvegi dot com
Target Milestone: ---
Consider this testcase:
unsigned long bar(unsigned long max)
{
unsigned long m = __builtin_clzl(max);
unsigned long n = 64 - m;
return n ? n : 0xbad;
}
When building this, it would never return 0xbad. This is fine in theory, since
0 behavior is undefined. But most modern CPUs have an lzcnt instruction that is
well-defined for 0, and one there is no way to generates that properly without
inline assembly. __builtin_clzl(x) does generate the proper instruction, but it
assumes that the value is in the range of 0..63 and misoptimizes code that
relies on 64. Using __builtin_ia32_lzcnt_u64 works properly on x86, but it's
architecture specific, and PowerPC for example have no equivalent builtin. It
would be great to have a target-independent builtin for clz that is defined at
0, kind of like __builtin_ffs for trailng zeros (except that __builtin_ffs
generates inefficient code, but that's a different issue).
This is somewhat related to this:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78103