Issue 83840
Summary [X86][AVX] Recognise out of bounds AVX2 shift amounts
Labels good first issue, backend:X86, missed-optimization
Assignees
Reporter RKSimon
    Pulled out of #39822 which was a bit too general.

Unlike the general ISD SRA/SRL/SHL nodes, the AVX2 vector shift nodes X86ISD VSRAV/VSRLV/VSHLV handle out of bounds shift amounts:

- VSRAV clamps the unsigned shift amount to (BITWIDTH-1)
- VSRLV/VSHLV returns a zero value for unsigned shift amounts greater than (BITWIDTH-1).

So when lowering vector shifts, we should be able to fold any shift amount clamp patterns and use the X86ISD nodetypes.

e.g.

```ll
define <4 x i32> @ashr(<4 x i32> %sh, <4 x i32> %amt) {
  %elt.min.i = tail call <4 x i32> @llvm.umin.v4i32(<4 x i32> %amt, <4 x i32> <i32 31, i32 31, i32 31, i32 31>)
  %shr = ashr <4 x i32> %sh, %elt.min.i
  ret <4 x i32> %shr
}
```
-> 
```asm
ashr(int vector[4], unsigned int vector[4]):
        vpbroadcastd    xmm2, dword ptr [rip + .LCPI0_0] # xmm2 = [31,31,31,31]
        vpminud xmm1, xmm1, xmm2
        vpsravd xmm0, xmm0, xmm1
        ret
```
vs
```asm
ashr(int vector[4], unsigned int vector[4]):
        vpsravd xmm0, xmm0, xmm1
 ret
```

Logical shifts are trickier but also foldable:
```ll
define <4 x i32> @lshr(<4 x i32> %sh, <4 x i32> %amt) {
  %cmp.i = icmp ult <4 x i32> %amt, <i32 32, i32 32, i32 32, i32 32>
 %shr = lshr <4 x i32> %sh, %amt
  %0 = select <4 x i1> %cmp.i, <4 x i32> %shr, <4 x i32> zeroinitializer
  ret <4 x i32> %0
}

define <4 x i32> @lshr2(<4 x i32> %sh, <4 x i32> %amt) {
  %cmp.i = icmp ult <4 x i32> %amt, <i32 32, i32 32, i32 32, i32 32>
  %0 = select <4 x i1> %cmp.i, <4 x i32> %sh, <4 x i32> zeroinitializer
  %shr = lshr <4 x i32> %0, %amt
 ret <4 x i32> %shr
}
```
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to