http://llvm.org/bugs/show_bug.cgi?id=17128

            Bug ID: 17128
           Summary: bit-scan-forward / count-trailing-zeros loop not
                    recognized
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: All
            Status: NEW
          Severity: normal
          Priority: P
         Component: Scalar Optimizations
          Assignee: [email protected]
          Reporter: [email protected]
                CC: [email protected]
    Classification: Unclassified

$ ./clang -v
clang version 3.4 (trunk 189776)
Target: x86_64-apple-darwin11.4.2
Thread model: posix

$ cat tzcnt.c 
int tzcnt(int x) {
   int count = 0;
   int i = 0;
   while ( i<32 && (((x >> i) & 0x1) == 0)) {
      i++;
      count++;
   }
   return count;
}

$ ./clang -S -O3 -fomit-frame-pointer -march=core-avx2 -o /dev/stdout tzcnt.c 
    .section    __TEXT,__text,regular,pure_instructions
    .globl    _tzcnt
    .align    4, 0x90
_tzcnt:                                 ## @tzcnt
    .cfi_startproc
## BB#0:                                ## %entry
    xorl    %eax, %eax
    .align    4, 0x90
LBB0_1:                                 ## %land.rhs
                                        ## =>This Inner Loop Header: Depth=1
    btl    %eax, %edi
    jb    LBB0_3
## BB#2:                                ## %while.body
                                        ##   in Loop: Header=BB0_1 Depth=1
    incl    %eax
    cmpl    $32, %eax
    jl    LBB0_1
LBB0_3:                                 ## %while.end
    ret

...

On CPUs with the BMI feature, I was hoping this loop would generate the 'tzcnt'
instruction:

tzcnt %edi, %eax

On x86 CPUs without BMI, this loop could also be implemented with the 'bsf'
instruction with a leading check for a zero input value. In the case of zero
input, the compiler would have to return '32' because the hardware doesn't.

According to Intel's Volume 2 ISA reference:
"The key difference between TZCNT and BSF instruction is that TZCNT provides
operand size as output when source operand is zero while in the case of BSF
instruction, if source operand is zero, the content of destination operand are
undefined. On processors that do not support TZCNT, the instruction byte
encoding is executed as BSF."

-- 
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
LLVMbugs mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/llvmbugs

Reply via email to