https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101139
Bug ID: 101139 Summary: Unable to remove double byteswap in fast path Product: gcc Version: 10.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: steinar+gcc at gunderson dot no Target Milestone: --- The following code is reduced from a real interpreter: extern void (*a[])(); int d, e, h, l; typedef struct { char ab; } f; f g; short i(); short m68ki_read_imm_16() { short j, k; int b = d; f f = g; if (b < h) return __builtin_bswap16((&f.ab)[0]); k = i(); short c = k; j = __builtin_bswap16(c); return j; } int b() { short m; do { m = m68ki_read_imm_16(); short c = m; l = __builtin_bswap16(c); a[l](); } while (e); return e; } Compiling with arm-linux-gnueabihf-gcc-10 -O2 yields this interesting sequence in the function: b .L11 .L15: ldrb r3, [r5, #8] @ zero_extendqisi2 rev16 r3, r3 uxth r3, r3 .L10: rev16 r3, r3 uxth r3, r3 The original code intention was to have a reusable function that returned in big-endian, but that a specific use of it would be able to ignore endianness into a table lookup, removing the double-swap entirely. GCC can normally do that, but it seems that the branch in m68ki_read_imm_16() somehow gets in the way. Just to be clear, I expect zero rev16 instructions altogether in b() when m68ki_read_imm_16() is inlined. The problem is not ARM-specific; x86 shows a similar problematic sequence: leaq a(%rip), %rbx jmp .L11 .p2align 4,,10 .p2align 3 .L15: movsbw g(%rip), %ax rolw $8, %ax .L10: rolw $8, %ax movzwl %ax, %edx Also verified with gcc version 12.0.0 20210527 (experimental) [master revision 262e75d22c3:7bb6b9b2f47:9d3a953ec4d2695e9a6bfa5f22655e2aea47a973] (Debian 20210527-1)