https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124838

            Bug ID: 124838
           Summary: __builtin_clzl(0) overoptimization even on targets
                    with native lzcnt instruction
           Product: gcc
           Version: 15.2.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: zoltan at hidvegi dot com
  Target Milestone: ---

Consider this testcase:

unsigned long bar(unsigned long max)
{
    unsigned long m = __builtin_clzl(max);
    unsigned long n = 64 - m;
    return n ? n : 0xbad;
}

When building this, it would never return 0xbad. This is fine in theory, since
0 behavior is undefined. But most modern CPUs have an lzcnt instruction that is
well-defined for 0, and one there is no way to generates that properly without
inline assembly. __builtin_clzl(x) does generate the proper instruction, but it
assumes that the value is in the range of 0..63 and misoptimizes code that
relies on 64. Using __builtin_ia32_lzcnt_u64 works properly on x86, but it's
architecture specific, and PowerPC for example have no equivalent builtin. It
would be great to have a target-independent builtin for clz that is defined at
0, kind of like __builtin_ffs for trailng zeros (except that __builtin_ffs
generates inefficient code, but that's a different issue).

This is somewhat related to this:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78103

Reply via email to