https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61578

--- Comment #26 from Vladimir Makarov <vmakarov at gcc dot gnu.org> ---
(In reply to Fredrik Hederstierna from comment #23)
> 
> Here's is another small example I tested yesterday that also gives
> unnecessary moves, both for arm7tdmi, arm966e-s and cortex-m0 tested.
> 
> extern void func(int data);
> char cdata[4];
> void test(void) {
>   int *idata = (int*)cdata;
>   func(*idata);
> }
> 
> Compiles with GCC 4.8.5 (cortex-m0):
> 
> 00000000 <test>:
>    0:   b508            push    {r3, lr}
>    2:   4b07            ldr     r3, [pc, #28]   ; (20 <test+0x20>)
>    4:   7858            ldrb    r0, [r3, #1]
>    6:   781a            ldrb    r2, [r3, #0]
>    8:   0200            lsls    r0, r0, #8
>    a:   4310            orrs    r0, r2
>    c:   789a            ldrb    r2, [r3, #2]
>    e:   78db            ldrb    r3, [r3, #3]
>   10:   0412            lsls    r2, r2, #16
>   12:   4310            orrs    r0, r2
>   14:   061b            lsls    r3, r3, #24
>   16:   4318            orrs    r0, r3
>   18:   f7ff fffe       bl      0 <func>
>   1c:   bd08            pop     {r3, pc}
>   1e:   46c0            nop                     ; (mov r8, r8)
>   20:   00000000        .word   0x00000000
> 
> With GCC 6 master with latest LRA patch (+4 bytes):
> 
> 00000000 <test>:
>    0:   b510            push    {r4, lr}
>    2:   4c08            ldr     r4, [pc, #32]   ; (24 <test+0x24>)
>    4:   7863            ldrb    r3, [r4, #1]
>    6:   7821            ldrb    r1, [r4, #0]
>    8:   78a0            ldrb    r0, [r4, #2]
>    a:   021b            lsls    r3, r3, #8
>    c:   430b            orrs    r3, r1
>    e:   0400            lsls    r0, r0, #16
>   10:   001a            movs    r2, r3   ??? MOVE
>   12:   0003            movs    r3, r0   ??? MOVE
>   14:   78e0            ldrb    r0, [r4, #3]
>   16:   4313            orrs    r3, r2
>   18:   0600            lsls    r0, r0, #24
>   1a:   4318            orrs    r0, r3
>   1c:   f7ff fffe       bl      0 <func>
>   20:   bd10            pop     {r4, pc}
>   22:   46c0            nop                     ; (mov r8, r8)
>   24:   00000000        .word   0x00000000
> 
> Kind Regards, Fredrik

I found the problem root.

We have

  insn 9: p115=p114|p112
  ...
  insn 12: p118=p117|p115
  ...

IRA assigns different regs to p112, p115, and p118

      ...
      Popping a0(r121,l0)  -- assign reg 0
      Popping a1(r118,l0)  -- assign reg 3
      Popping a5(r115,l0)  -- assign reg 2
      Popping a8(r112,l0)  -- assign reg 1
      ...

Therefore LRA generates redundant insn 22 for insn 9 and insn 23 for
insn 12 as an input and the output operands should be in the same
register.

There is no conflicts preventing to assign the same hard reg to p112,
p115, and p118 but IRA does not do this following heuristics taking
other conflict pseudos costs into account.

So the solution is to change the heuristics somehow.  Even if I manage
to do this, the changes should be benchmarked on other architectures
thorougly.  It means the PR will need a lot of time to be fixed but I
am going to work on it.

Reply via email to