stefanp added a comment.

In D91279#2390160 <https://reviews.llvm.org/D91279#2390160>, @shchenz wrote:

> Using dform with offset 0 can save one register r0/X0, this is benefit for 
> register allocation? But adding it in `PPCPreEmitPeephole` pass which is 
> after register allocation will make the benefit gone.
> Maybe we need to do it before register allocation? For example at the place 
> where the x-form with zero register is generated.
>
> I checked one example `loadConstant` in  
> `test/CodeGen/PowerPC/f128-passByValue.ll`.
> We generate `LXVX $zero8, ` in ISEL because we meet the worst case and we 
> don't have d-form choice for the instruction selection. so we have to use 
> x-form and in x-form selection, we have to use zero/zero8 as the base and use 
> load address as the index. See `PPCTargetLowering::SelectAddressRegRegOnly`.
>
> I guess most cases are with same reason for generating x-form + zero 
> register, we meet the worst case in ISEL, so we have to use x-form + zero 
> register form, with this form, we can always select a powerpc load/store 
> instruction.
>
> For me, a better solution should be change the worst case handling in ISEL, 
> it is before RA and it is also transparent for types like STXVX/LXVX/ and 
> also LDX/STDX, LFDX/STFDX...

I'm just going to jump in to give a little more background. The initial reason 
we wanted to do this was to enable an optimization that actually happens in the 
linker after the code is emitted.
To get the idea you can look at this test:

  /llvm/test/CodeGen/PowerPC/pcrel-linkeropt.ll

Which contains this section:

  ; FIXME: we should always convert X-Form instructions that use
  ; PPC::ZERO[8] to the corresponding D-Form so we can perform this opt.
  define dso_local void @ReadWrite128() local_unnamed_addr #0 {
  ; CHECK-LABEL: ReadWrite128:
  ; CHECK:       # %bb.0: # %entry
  ; CHECK-NEXT:    pld r3, input128@got@pcrel(0), 1
  ; CHECK-NEXT:    lxvx vs0, 0, r3
  ; CHECK-NEXT:    pld r3, output128@got@pcrel(0), 1
  ; CHECK-NEXT:    stxvx vs0, 0, r3
  ; CHECK-NEXT:    blr
  entry:
    %0 = load i128, i128* @input128, align 16
    store i128 %0, i128* @output128, align 16
    ret void
  }

When we have a GOT access like this it is possible for the compiler to mark the 
instruction with `R_PPC64_PCREL_OPT` and then the linker merges the two 
instructions into one and replaces the second instruction with a `nop`. The 
problem is that this opt can only be done if the second instruction is DForm. 
We noticed that when we implemented this optimization we could not catch all of 
the cases because in some situations (like the one above) we use the XForm 
instead of the DForm.

Having said that, we should try to do this before the PreEmitPeephole. The 
optimization that adds the `R_PPC64_PCREL_OPT` relocation is also in the 
PreEmitPeephole and I'm not sure if it will be detected if we do both things at 
the same time (both as in convert the XForm to a DForm and then have the same 
opt use that DForm to add the relocation).

I agree that ISel is a better place for this. If we cannot do this in ISel then 
we should still try to do this before we get to the PreEmitPeephole or at least 
make sure that both the DForm is present and that the `R_PPC64_PCREL_OPT` 
relocation is added as we expected in the same pass.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91279/new/

https://reviews.llvm.org/D91279

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to