There is a claim in linux asm-i386/byteswap.h that: /* Do not define swab16. Gcc is smart enough to recognize "C" version and convert it into rotation or exhange. */
Not really. Consider these two testcases: --cut here-- unsigned short bad(unsigned short a) { return ((a & 0xff00) >> 8 | (a & 0x00ff) << 8); } unsigned short good(unsigned short a) { return (a >> 8 | a << 8); } --cut here-- gcc -O2 -S -m32 -fomit-frame-pointer bad: movzwl 4(%esp), %edx movl %edx, %eax sall $8, %eax shrl $8, %edx orl %edx, %eax movzwl %ax, %eax ret good: movzwl 4(%esp), %eax rolw $8, %ax movzwl %ax, %eax ret IMO both forms should produce equal asm. Unfortunatelly, first form is usually used: drivers/net/sk98lin/skxmac2.c: SWord = ((SWord & 0xff00) >> 8) | ((SWord & 0x00ff) << 8); drivers/atm/iphase.c: #define swap(x) (((x & 0xff) << 8) | ((x & 0xff00) >> 8)) drivers/atm/iphase.c: trailer->length = ((skb->len & 0xff) << 8) | ((skb->len & 0xff00) >> 8); Ideally, this construct should be compiled using (not yet introduced...) bswaphi2 pattern to generate "xchgb %ah,%al" instead of rolw insn. Accordint to pentopt.pdf, xchg is faster, 1.5 vs 4 clks. Maybe this transformation could also be used for 32bit and 64bit data, to automatically convert open coded shift series into bswapsi2 and bswapdi2 patterns. -- Summary: Missing byte swap optimizations Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: ubizjak at gmail dot com GCC host triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29749