https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93040
Bug ID: 93040
Summary: gcc doesn't optimize unaligned accesses to a 16-bit
value on the x86 as well as it does a 32-bit value (or
clang)
Product: gcc
Version: unknown
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: miles at gnu dot org
Target Milestone: ---
Given the following code:
unsigned short get_unaligned_16 (unsigned char *p)
{
return p[0] | (p[1] << 8);
}
unsigned int get_unaligned_32 (unsigned char *p)
{
return get_unaligned_16 (p) | (get_unaligned_16 (p + 2) << 16);
}
unsigned int get_unaligned_32_alt (unsigned char *p)
{
return p[0] | (p[1] << 8) | (p[2] << 16) | (p[3] << 24);
}
... Clang/LLVM (trunk, but it has the same results many versions back)
generates the following very nice output:
get_unaligned_16: # @get_unaligned_16
movzx eax, word ptr [rdi]
ret
get_unaligned_32: # @get_unaligned_32
mov eax, dword ptr [rdi]
ret
get_unaligned_32_alt: # @get_unaligned_32_alt
mov eax, dword ptr [rdi]
ret
Whereas gcc (trunk but ditto) generates:
get_unaligned_16:
movzx eax, BYTE PTR [rdi+1]
sal eax, 8
mov edx, eax
movzx eax, BYTE PTR [rdi]
or eax, edx
ret
get_unaligned_32:
movzx eax, BYTE PTR [rdi+3]
sal eax, 8
mov edx, eax
movzx eax, BYTE PTR [rdi+2]
or eax, edx
movzx edx, BYTE PTR [rdi+1]
sal eax, 16
mov ecx, edx
movzx edx, BYTE PTR [rdi]
sal ecx, 8
or edx, ecx
movzx edx, dx
or eax, edx
ret
get_unaligned_32_alt:
mov eax, DWORD PTR [rdi]
ret
Notice that in the "get_unaligned_32_alt" version, gcc _does_ detect
that this is really an unaligned access to a 32-bit integer and
reduces it to a single instruction on the x86, as that architecture
supports unaligned accesses.
However the 16-bit version, "get_unaligned_16", and get_unaligned_32
derived from that, it just uses the component bit-munching operations.
It does seem curious that gcc manages the 32-bit case, but fails on
the 16-bit case...
I tested gcc on godbolt.com, and Clang locally (and on godbolt).
Flags used:
-O2 -march=skylake
-Os and -O3 yield the same results.
Versions:
gcc (Compiler-Explorer-Build) 10.0.0 20191220 (experimental)
clang version 10.0.0 (https://github.com/llvm/llvm-project.git
b4dfa74a5d80b3602a5315fac2ef5f98b0e63708)