https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65162
Bug ID: 65162 Summary: [5 Regression][SH] Redundant tests when storing bswapped T bit result in unaligned mem Product: gcc Version: 5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: olegendo at gcc dot gnu.org Target: sh*-*-* The following example is taken from libav, which contains quite some uses of this code pattern. typedef unsigned short int uint16_t; union unaligned_16 { uint16_t l; } __attribute__((packed)); int test (unsigned char* buf, int bits_per_component) { (((union unaligned_16 *)(buf))->l) = __builtin_bswap16 (bits_per_component == 10 ? 1 : 0); return 0; } compiled with 4.8 / 4.9 and -m4 -O2: mov r5,r0 cmp/eq #10,r0 movt r0 swap.b r0,r0 extu.w r0,r0 extu.b r0,r1 shlr8 r0 mov.b r1,@r4 mov.b r0,@(1,r4) rts mov #0,r0 compiled with 5.0 r220892: mov r5,r0 cmp/eq #10,r0 movt r1 swap.b r1,r1 // r1 = T << 8 extu.w r1,r1 tst r1,r1 // T = r1 == 0 = 1-T mov #-1,r0 negc r0,r0 // r0 = 1-T = r1 != 0 = cmp/eq #10,r0 extu.b r1,r2 mov.b r0,@(1,r4) // @(1,r4) = cmp/eq #10,r0 mov.b r2,@r4 // @r4 = 0 rts mov #0,r0 This is caused by the introduction of the treg_set_expr stuff. For some reason, combine tries to recalculate the 'other' stored byte and tries this pattern: Successfully matched this instruction: (set (reg:SI 179) (ne:SI (reg:SI 164 [ D.1476 ]) (const_int 0 [0]))) The resulting insn sequence looks roughly like this: (set (reg:SI 169) (eq:SI (reg:SI 5 r5) (const_int 10))) (set (reg:HI 170) (rotate:HI (subreg:HI (reg:SI 169) 0) (const_int 8))) (set (reg:SI 164) (zero_extend:SI (reg:HI 170))) (set (reg:SI 171) (zero_extend:SI (subreg:QI (reg:SI 164) 0))) (set (mem:QI (reg/v/f:SI 166) (subreg:QI (reg:SI 171) 0))) (set (reg:SI 179) (ne:SI (reg:SI 164) (const_int 0))) (set (mem:QI (plus:SI (reg/v/f:SI 166) (const_int 1))) (subreg:QI (reg:SI 179) 0)) If the ne:SI pattern is not matched, the redundant comparison/test is not emitted. The sh_treg_combine.cc RTL was actually meant to handle such cases of repeated T bit inversions/tests. However, at the moment it is triggered by conditional insns only. Moreover, it currently will not look through zero_extend and the rotate insns. On the other hand, this seems to happen only for unaligned stores. Examples such as typedef unsigned short int uint16_t; int test_00 (unsigned short* buf, int bits_per_component) { buf[0] = __builtin_bswap16 (bits_per_component == 10 ? 1 : 0); return 0; } int test_01 (unsigned char* buf, int bits_per_component) { buf[0] = __builtin_bswap16 (bits_per_component == 10 ? 1 : 0); return 0; } int test_02 (unsigned char* buf, int bits_per_component) { buf[0] = __builtin_bswap16 (bits_per_component == 10 ? 1 : 0) >> 8; return 0; } int test_03 (unsigned char* buf, int bits_per_component) { buf[1] = __builtin_bswap16 (bits_per_component == 10 ? 1 : 0) >> 0; buf[0] = __builtin_bswap16 (bits_per_component == 10 ? 1 : 0) >> 8; return 0; } .. do not suffer from the problem. Probably this problem will not be triggered after unaligned loads/stores have been improved (PR 64306).