Issue 63848
Summary Consider reducing 64-bit shifts with clamped shift amounts to 32-bit
Labels backend:AMDGPU, missed-optimization
Assignees bcahoon
Reporter arsenm
    Currently the backend tries to reduce 64-bit shift by constant to 32-bit shifts when possible. This can be extended to cases with variable but known bounds. https://alive2.llvm.org/ce/z/_56Y57

```
declare i32 @llvm.smax.i32(i32, i32)
declare i32 @llvm.umax.i32(i32, i32)

; noundef is to stop alive timeouts
define i64 @src(i64 noundef %arg0, i32 noundef %arg1) {
  %min = call i32 @llvm.umax.i32(i32 %arg1, i32 32)
 %shift.amt = zext i32 %min to i64
  %shl = shl i64 %arg0, %shift.amt
 ret i64 %shl
}

define i64 @tgt(i64 %arg0, i32 %arg1) {
 %lo.bits = trunc i64 %arg0 to i32
  %sub = add i32 %arg1, -32
  %min = call i32 @llvm.smax.i32(i32 %sub, i32 0)
  %shl = shl i32 %lo.bits, %min
  %insert.1 = insertelement <2 x i32> <i32 0, i32 poison>, i32 %shl, i64 1
  %bitcast = bitcast <2 x i32> %insert.1 to i64
  ret i64 %bitcast
}

```

If I codegen these two options, on quarter rate 64-bit shift targets, I believe option 2 wins out in cycle count (for much larger code size)
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to