| Issue |
181455
|
| Summary |
[X86] Vector 8-bit `icmp ugt + blend` with constant should use saturation arithmetic to avoid compare
|
| Labels |
new issue
|
| Assignees |
|
| Reporter |
WalterKruger
|
Due to gaps in support on x86, unsigned vector compares are implemented by checking if one of the operands is equal to the maximum/minimum (e.g. `a >= b` => `max(a, b) == a`). This method is often paired with `blendv`, which performs a conditional selection:
```asm
selectIfGreater:
movdqa xmm3, xmm0
movdqa xmm0, xmmword ptr [rip + .LCPI0]
pminub xmm0, xmm2
pcmpeqb xmm0, xmm2
pblendvb xmm3, xmm1, xmm0
movdqa xmm0, xmm3
ret
```
https://godbolt.org/z/4Gvoqrj1P
Blend only checks the most significant bits of the "mask" input, so it is possible to use a single unsigned saturation add/sub to emulate a compare (which is one instruction shorter). The method differs slightly based on the size of the compare constant:
```
(C < 127): blendv(a, b, addSat(x, 127 - C))
(C > 127): blendv(a, b, subSat(x, C - 127))
```
This appears to only be beneficial for 8-bits due to it supporting both a granular blendv and saturation arithmetic. (Although 64-bit can benefit from a slightly modified version: #181454)
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs