https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102391

            Bug ID: 102391
           Summary: Failure to optimize 2 8-bit loads into a single 16-bit
                    load
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: gabravier at gmail dot com
  Target Milestone: ---

#include <stdint.h>

uint16_t HeaderReadU16LE(int offset, uint8_t *RomHeader)
{
    return RomHeader[offset] |
        (RomHeader[offset + 1] << 8);
}

This can be optimized into a single 16-bit load. On -O3, this optimization is
done by LLVM, but not by GCC.

This winds up affecting the resulting assembly quite a bit:

AMD64 GCC:

HeaderReadU16LE:
  movsx rdi, edi
  movzx edx, BYTE PTR [rsi+1+rdi]
  movzx eax, BYTE PTR [rsi+rdi]
  sal edx, 8
  or eax, edx
  ret

AMD64 LLVM:

HeaderReadU16LE:
  movsxd rax, edi
  movzx eax, word ptr [rsi + rax]
  ret

Reply via email to