https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65162

            Bug ID: 65162
           Summary: [5 Regression][SH] Redundant tests when storing
                    bswapped T bit result in unaligned mem
           Product: gcc
           Version: 5.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: olegendo at gcc dot gnu.org
            Target: sh*-*-*

The following example is taken from libav, which contains quite some uses of
this code pattern.

typedef unsigned short int uint16_t;
union unaligned_16 { uint16_t l; } __attribute__((packed));

int
test (unsigned char* buf, int bits_per_component)
{
  (((union unaligned_16 *)(buf))->l) =
    __builtin_bswap16 (bits_per_component == 10 ? 1 : 0);

  return 0;
}

compiled with 4.8 / 4.9 and -m4 -O2:

        mov     r5,r0
        cmp/eq  #10,r0
        movt    r0
        swap.b  r0,r0
        extu.w  r0,r0
        extu.b  r0,r1
        shlr8   r0
        mov.b   r1,@r4
        mov.b   r0,@(1,r4)
    rts
    mov     #0,r0

compiled with 5.0 r220892:

        mov     r5,r0
        cmp/eq  #10,r0
        movt    r1
        swap.b  r1,r1      // r1 = T << 8
        extu.w  r1,r1
        tst     r1,r1      // T = r1 == 0 = 1-T
        mov     #-1,r0
        negc    r0,r0      // r0 = 1-T = r1 != 0 = cmp/eq #10,r0
        extu.b  r1,r2
        mov.b   r0,@(1,r4) // @(1,r4) = cmp/eq #10,r0
        mov.b   r2,@r4     // @r4 = 0
        rts
        mov     #0,r0

This is caused by the introduction of the treg_set_expr stuff.
For some reason, combine tries to recalculate the 'other' stored byte and tries
this pattern:

Successfully matched this instruction:
(set (reg:SI 179)
    (ne:SI (reg:SI 164 [ D.1476 ])
        (const_int 0 [0])))

The resulting insn sequence looks roughly like this:

(set (reg:SI 169) (eq:SI (reg:SI 5 r5) (const_int 10)))
(set (reg:HI 170) (rotate:HI (subreg:HI (reg:SI 169) 0) (const_int 8)))
(set (reg:SI 164) (zero_extend:SI (reg:HI 170)))
(set (reg:SI 171) (zero_extend:SI (subreg:QI (reg:SI 164) 0)))
(set (mem:QI (reg/v/f:SI 166) (subreg:QI (reg:SI 171) 0)))
(set (reg:SI 179) (ne:SI (reg:SI 164) (const_int 0)))
(set (mem:QI (plus:SI (reg/v/f:SI 166) (const_int 1)))
     (subreg:QI (reg:SI 179) 0))


If the ne:SI pattern is not matched, the redundant comparison/test is not
emitted.

The sh_treg_combine.cc RTL was actually meant to handle such cases of repeated
T bit inversions/tests.  However, at the moment it is triggered by conditional
insns only.  Moreover, it currently will not look through zero_extend and the
rotate insns.

On the other hand, this seems to happen only for unaligned stores.  Examples
such as 

typedef unsigned short int uint16_t;
int
test_00 (unsigned short* buf, int bits_per_component)
{
  buf[0] =
    __builtin_bswap16 (bits_per_component == 10 ? 1 : 0);

  return 0;
}

int
test_01 (unsigned char* buf, int bits_per_component)
{
  buf[0] =
    __builtin_bswap16 (bits_per_component == 10 ? 1 : 0);

  return 0;
}

int
test_02 (unsigned char* buf, int bits_per_component)
{
  buf[0] =
    __builtin_bswap16 (bits_per_component == 10 ? 1 : 0) >> 8;

  return 0;
}

int
test_03 (unsigned char* buf, int bits_per_component)
{
  buf[1] = __builtin_bswap16 (bits_per_component == 10 ? 1 : 0) >> 0;
  buf[0] = __builtin_bswap16 (bits_per_component == 10 ? 1 : 0) >> 8;

  return 0;
}

.. do not suffer from the problem.

Probably this problem will not be triggered after unaligned loads/stores have
been improved (PR 64306).

Reply via email to