Issue 177913
Summary [AMDGPU] Scheduling fails to rematerialize copysign immediate over call
Labels backend:AMDGPU, missed-optimization
Assignees
Reporter arsenm
    This [testcase](https://godbolt.org/z/vfqe3Ksrz) has a regression between good and bad due to poor CSE / remat decisions. The first version has an expansion of copysign that works in context, and second uses copysign. The second version has an extra CSR SGPR spill (seen by the additional pair of v_readlane_b32 and v_writelane_b32)


Coming out of selection, before the call (for the fabs which cannot fold into a source modifier for a function argument) there is:
```
  %16:sreg_32 = S_MOV_B32 2147483647
  %17:vgpr_32 = V_AND_B32_e64 %16:sreg_32, killed %15:vgpr_32, implicit $exec
```

Later in the function, after the call, for the copysign, the same constant is used:

```
  %38:sreg_32 = S_MOV_B32 2147483647
  %39:vgpr_32 = V_BFI_B32_e64 killed %38:sreg_32, killed %36:vgpr_32, killed %37:vgpr_32, implicit $exec
```

MachineCSE merges these constants to the earlier use, extending the live range of the constant over the call. This maybe shouldn't have happened in the first place, but it also should have been rematerialized.


```
target triple = "amdgcn-amd-amdhsa"

define double @good(double %x, i32 %y.arg) #0 {
bb:
  %y = or i32 %y.arg, 1
 %__fabs = call fast double @llvm.fabs.f64(double %x)
  %__log2 = call fast double @_Z4log2d(double %__fabs)
  %pownI2F = sitofp i32 %y to double
 %__ylogx = fmul fast double %__log2, %pownI2F
  %__exp2 = call fast double @_Z4exp2d(double %__ylogx)
  %i = bitcast double %x to i64
  %__pow_sign = and i64 %i, -9223372036854775808
  %i1 = bitcast double %__exp2 to i64
 %i2 = or i64 %__pow_sign, %i1
  %i3 = bitcast i64 %i2 to double
  ret double %i3
}

define double @bad(double %x, i32 %y.arg) #0 {
bb:
  %y = or i32 %y.arg, 1
  %__fabs = call fast double @llvm.fabs.f64(double %x)
 %__log2 = call fast double @_Z4log2d(double %__fabs)
  %pownI2F = sitofp i32 %y to double
  %__ylogx = fmul fast double %__log2, %pownI2F
  %__exp2 = call fast nofpclass(nan ninf nzero nsub nnorm) double @_Z4exp2d(double %__ylogx)
  %i = call double @llvm.copysign.f64(double %__exp2, double %x)
 ret double %i
}

declare hidden double @_Z4powndi(double, i32) #0
declare double @_Z4exp2d(double) #1
declare double @llvm.fabs.f64(double) #2
declare double @_Z4log2d(double) #1
declare double @llvm.copysign.f64(double, double) #2

attributes #0 = { nounwind }
attributes #1 = { nounwind memory(read) }
attributes #2 = { nocallback nocreateundeforpoison nofree nosync nounwind speculatable willreturn memory(none) }

```
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to