According to Intel Technology Journal [1], page 270, bt instruction runs 20%
faster on Core2 Duo than equivalent generic code.

---Qoute from p.270---
The bit test instruction bt was introduced in the i386™
processor. In some implementations, including the Intel
NetBurst® micro-architecture, the instruction has a high
latency. The Intel Core micro-architecture executes bt in
a single cycle, when the bit base operand is a register.
Therefore, the Intel C++/Fortran compiler uses the bt
instruction to implement a common bit test idiom when
optimizing for the Intel Core micro-architecture. The
optimized code runs about 20% faster than the generic
version on an Intel Core 2 Duo processor. Both of these
versions are shown below:

C source code
  int x, n;
  ...
  if (x & (1 << n)) ...

Generic code generation
  ; edx contains x, ecx contains n.
  mov eax, 1
  shl eax, cl
  test edx, eax
  je taken

Intel Core micro-architecture code generation
  ; edx contains x, eax contains n.
  bt edx, eax
  jae taken
---/Quote---

I have a patch in testing that implements suggested optimization for
TARGET_USE_BT (including core2) targets.

[1] Inside the Intel® 10.1 Compilers: New Threadizer and New
Vectorizer for Intel® Core™2 Processors, Intel Technology Journal, Vol. 11,
Issue 4, November 15, 2007,
http://download.intel.com/technology/itj/2007/v11i4/1-inside/1-Inside_the_Intel_Compilers.pdf


-- 
           Summary: Generate bit test (bt) instructions
           Product: gcc
           Version: 4.3.0
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: target
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: ubizjak at gmail dot com
GCC target triplet: x86


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36473

Reply via email to