https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102391
Bug ID: 102391 Summary: Failure to optimize 2 8-bit loads into a single 16-bit load Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- #include <stdint.h> uint16_t HeaderReadU16LE(int offset, uint8_t *RomHeader) { return RomHeader[offset] | (RomHeader[offset + 1] << 8); } This can be optimized into a single 16-bit load. On -O3, this optimization is done by LLVM, but not by GCC. This winds up affecting the resulting assembly quite a bit: AMD64 GCC: HeaderReadU16LE: movsx rdi, edi movzx edx, BYTE PTR [rsi+1+rdi] movzx eax, BYTE PTR [rsi+rdi] sal edx, 8 or eax, edx ret AMD64 LLVM: HeaderReadU16LE: movsxd rax, edi movzx eax, word ptr [rsi + rax] ret