| Issue |
63848
|
| Summary |
Consider reducing 64-bit shifts with clamped shift amounts to 32-bit
|
| Labels |
backend:AMDGPU,
missed-optimization
|
| Assignees |
bcahoon
|
| Reporter |
arsenm
|
Currently the backend tries to reduce 64-bit shift by constant to 32-bit shifts when possible. This can be extended to cases with variable but known bounds. https://alive2.llvm.org/ce/z/_56Y57
```
declare i32 @llvm.smax.i32(i32, i32)
declare i32 @llvm.umax.i32(i32, i32)
; noundef is to stop alive timeouts
define i64 @src(i64 noundef %arg0, i32 noundef %arg1) {
%min = call i32 @llvm.umax.i32(i32 %arg1, i32 32)
%shift.amt = zext i32 %min to i64
%shl = shl i64 %arg0, %shift.amt
ret i64 %shl
}
define i64 @tgt(i64 %arg0, i32 %arg1) {
%lo.bits = trunc i64 %arg0 to i32
%sub = add i32 %arg1, -32
%min = call i32 @llvm.smax.i32(i32 %sub, i32 0)
%shl = shl i32 %lo.bits, %min
%insert.1 = insertelement <2 x i32> <i32 0, i32 poison>, i32 %shl, i64 1
%bitcast = bitcast <2 x i32> %insert.1 to i64
ret i64 %bitcast
}
```
If I codegen these two options, on quarter rate 64-bit shift targets, I believe option 2 wins out in cycle count (for much larger code size)
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs