https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70048

--- Comment #11 from Jiong Wang <jiwang at gcc dot gnu.org> ---
(In reply to Richard Henderson from comment #10)
> Created attachment 37890 [details]
> second patch
> 
> Still going through full testing, but I wanted to post this
> before the end of the day.
> 
> This update includes a virt_or_elim_regno_p, as discussed in #c7/#c8.
> 
> It also updates aarch64_legitimize_address to treat R0+R1+C as a special
> case of R0+(R1*S)+C.  All of the arguments wrt scaling apply to unscaled
> indices as well.
> 
> As a minor point, doing some of the expansion in a slightly different
> order results in less garbage rtl being generated in the process.

Richard,

  I just recalled the reassociation of constant offset with vritual frame
pointer will increase register pressure, thus cause bad code generation under
some situations. For example, the testcase given at
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62173#c8

void bar(int i)              
{                        
  char A[10];
  char B[10];     
  char C[10];       
  g(A);                        
  g(B);
  g(C);                              
  f(A[i]);                 
  f(B[i]);                        
  f(C[i]);                   
  return;               
} 

  Before your patch we are generating  (-O2)
  ===
bar:
        stp     x29, x30, [sp, -80]!
        add     x29, sp, 0
        add     x1, x29, 80
        str     x19, [sp, 16]
        mov     w19, w0
        add     x0, x29, 32
        add     x19, x1, x19, sxtw
        bl      g
        add     x0, x29, 48
        bl      g
        add     x0, x29, 64
        bl      g
        ldrb    w0, [x19, -48]
        bl      f
        ldrb    w0, [x19, -32]
        bl      f
        ldrb    w0, [x19, -16]
        bl      f
        ldr     x19, [sp, 16]
        ldp     x29, x30, [sp], 80
        ret

  After your patch, we are generating:
  ===
bar:
        stp     x29, x30, [sp, -96]!
        add     x29, sp, 0
        stp     x21, x22, [sp, 32]
        add     x22, x29, 48
        stp     x19, x20, [sp, 16]
        mov     w19, w0
        mov     x0, x22
        add     x21, x29, 64
        add     x20, x29, 80
        bl      g
        mov     x0, x21
        bl      g
        mov     x0, x20
        bl      g
        ldrb    w0, [x22, w19, sxtw]
        bl      f
        ldrb    w0, [x21, w19, sxtw]
        bl      f
        ldrb    w0, [x20, w19, sxtw]
        bl      f
        ldp     x19, x20, [sp, 16]
        ldp     x21, x22, [sp, 32]
        ldp     x29, x30, [sp], 96
        ret

  We are using more callee saved registers, thus extra stp/ldp generated.

  But we do will benefit from reassociation constant offset with virtual frame
pointer if it's inside loop, because:

   * vfp + const_offset is loop invariant
   * the virtual reg elimination on vfp will eventually generate one
     extra instruction if it was not used with const_offset but another reg.

  Thus after this reassociation, rtl IVOPT can hoist it out of loop, and we
will save two instructions in the loop. 

  A fix was proposed for loop-invariant.c to only do such reshuffling for loop,
see https://gcc.gnu.org/ml/gcc-patches/2014-12/msg01253.html.  That patch
finally stopped because the issue PR62173 was fixed on tree level, and the
pointer re-shuffling was considered to have hidding overflow risk though will
be very rare.

Reply via email to