https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84431

            Bug ID: 84431
           Summary: Suboptimal code for masked shifts (x86/x86-64)
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: nruslan_devel at yahoo dot com
  Target Milestone: ---

In x86 and x86-64, the assumption is that upper bits of the CL register are
unused (i.e., masked) when doing a shift operation. It is not possible to do
shift for more than (WORD_BITS - 1) positions. Normally, the compiler has to
check whether the specified shift value exceeds the word size before generating
corresponding shld/shl commands (shrd/shr, etc).

Now, if the shift value is given by some variable, it is normally unknown at
compile time whether it is exceeding (WORD_BITS - 1), so the compiler has to
generate corresponding checks. On the other hand, it is very easy to give a
hint to the compiler (if it is known that the shift < WORD_BITS) by masking
shift value like this (the example below is for i386; for x86-64 the type will
be __uint128_t and mask 63):

unsigned long long func(unsigned long long a, unsigned shift)
{
   return a << (shift & 31);
}

In the ideal scenario, the compiler has to just load value to CL without even
masking it because it is implied already by the shift operation.

Note that clang/LLVM recognizes this pattern (at least for i386) by generating
the following assembly code:
func:                                   # @func
    pushl   %esi
    movl    8(%esp), %esi
    movb    16(%esp), %cl
    movl    12(%esp), %edx
    movl    %esi, %eax
    shldl   %cl, %esi, %edx
    shll    %cl, %eax
    popl    %esi
    retl


GCC generates suboptimal code in this case:
func:
    pushl   %esi
    pushl   %ebx
    movl    20(%esp), %ecx
    movl    16(%esp), %esi
    movl    12(%esp), %ebx
    andl    $31, %ecx
    movl    %esi, %edx
    shldl   %ebx, %edx
    movl    %ebx, %eax
    xorl    %ebx, %ebx
    sall    %cl, %eax
    andl    $32, %ecx
    cmovne  %eax, %edx
    cmovne  %ebx, %eax
    popl    %ebx
    popl    %esi
    ret

Reply via email to