There is a claim in linux asm-i386/byteswap.h that:

/* Do not define swab16.  Gcc is smart enough to recognize "C" version and
   convert it into rotation or exhange.  */

Not really. Consider these two testcases:

--cut here--
unsigned short bad(unsigned short a)
{
       return ((a & 0xff00) >> 8 | (a & 0x00ff) << 8);
}


unsigned short good(unsigned short a)
{
       return (a >> 8 | a << 8);
}
--cut here--


gcc -O2 -S -m32 -fomit-frame-pointer
bad:
       movzwl  4(%esp), %edx
       movl    %edx, %eax
       sall    $8, %eax
       shrl    $8, %edx
       orl     %edx, %eax
       movzwl  %ax, %eax
       ret


good:
       movzwl  4(%esp), %eax
       rolw    $8, %ax
       movzwl  %ax, %eax
       ret


IMO both forms should produce equal asm.

Unfortunatelly, first form is usually used:

drivers/net/sk98lin/skxmac2.c:
SWord = ((SWord & 0xff00) >> 8) | ((SWord & 0x00ff) << 8);

drivers/atm/iphase.c:
#define swap(x) (((x & 0xff) << 8) | ((x & 0xff00) >> 8))  

drivers/atm/iphase.c:
trailer->length = ((skb->len & 0xff) << 8) | ((skb->len & 0xff00) >> 8);


Ideally, this construct should be compiled using (not yet introduced...)
bswaphi2 pattern to generate "xchgb %ah,%al" instead of rolw insn. Accordint to
pentopt.pdf, xchg is faster, 1.5 vs 4 clks.

Maybe this transformation could also be used for 32bit and 64bit data, to
automatically convert open coded shift series into bswapsi2 and bswapdi2
patterns.


-- 
           Summary: Missing byte swap optimizations
           Product: gcc
           Version: 4.3.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: ubizjak at gmail dot com
  GCC host triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29749

Reply via email to