| Issue |
176652
|
| Summary |
[X86] manual `avg` optimizes poorly
|
| Labels |
new issue
|
| Assignees |
|
| Reporter |
folkertdev
|
In short, LLVM optimizes `src` into `tgt`, but `tgt` generates much worse code:
https://godbolt.org/z/dWaj6cvYe
```llvm
; with +avx2
define noundef i32 @src(<4 x i64> %x, <4 x i64> %y) {
bb2:
%_4.i = bitcast <4 x i64> %x to <32 x i8>
%0 = zext <32 x i8> %_4.i to <32 x i16>
%_6.i = bitcast <4 x i64> %y to <32 x i8>
%1 = zext <32 x i8> %_6.i to <32 x i16>
%2 = add nuw nsw <32 x i16> %1, %0
%3 = add nuw nsw <32 x i16> %2, splat (i16 1)
%4 = lshr <32 x i16> %3, splat (i16 1)
%5 = trunc nuw <32 x i16> %4 to <32 x i8>
%_0.i = bitcast <32 x i8> %5 to <4 x i64>
%6 = icmp sgt <32 x i8> zeroinitializer, %5
%7 = bitcast <32 x i1> %6 to i32
ret i32 %7
}
define noundef i32 @tgt(<4 x i64> %x, <4 x i64> %y) {
start:
%_4.i = bitcast <4 x i64> %x to <32 x i8>
%0 = zext <32 x i8> %_4.i to <32 x i16>
%_6.i = bitcast <4 x i64> %y to <32 x i8>
%1 = zext <32 x i8> %_6.i to <32 x i16>
%2 = add nuw nsw <32 x i16> %0, splat (i16 1)
%3 = add nuw nsw <32 x i16> %2, %1
%4 = and <32 x i16> %3, splat (i16 256)
%5 = icmp ne <32 x i16> %4, zeroinitializer
%6 = bitcast <32 x i1> %5 to i32
ret i32 %6
}
```
Specifically (the full example and optimization pipeline is here https://rust.godbolt.org/z/crf61YKMj), `InstCombinePass` turns
```llvm
%4 = lshr <32 x i16> %3, splat (i16 1)
%5 = trunc nuw <32 x i16> %4 to <32 x i8>
%_0.i = bitcast <32 x i8> %5 to <4 x i64>
%6 = icmp sgt <32 x i8> zeroinitializer, %5
```
into
```llvm
%4 = and <32 x i16> %3, splat (i16 256)
%5 = icmp ne <32 x i16> %4, zeroinitializer
```
On its own that does seem better, but the optimization to `avg` is now missed, plus the operations are on the non-legal `32 x i16` type causing terrible codegen:
```asm
src:
vpavgb ymm0, ymm1, ymm0
vpmovmskb eax, ymm0
vzeroupper
ret
tgt:
vpmovzxbw ymm2, xmm0
vextracti128 xmm0, ymm0, 1
vpmovzxbw ymm0, xmm0
vpmovzxbw ymm3, xmm1
vpaddw ymm2, ymm2, ymm3
vextracti128 xmm1, ymm1, 1
vpmovzxbw ymm1, xmm1
vpaddw ymm0, ymm0, ymm1
vpcmpeqd ymm1, ymm1, ymm1
vpsubw ymm2, ymm2, ymm1
vpsubw ymm0, ymm0, ymm1
vpsllw ymm0, ymm0, 7
vpsllw ymm1, ymm2, 7
vpacksswb ymm0, ymm1, ymm0
vpermq ymm0, ymm0, 216
vpmovmskb eax, ymm0
vzeroupper
ret
```
This was reported here https://github.com/rust-lang/rust/issues/124216. https://github.com/llvm/llvm-project/issues/132166 is tangentially related.
I'm not sure whether the `avg` can reasonably be recovered, but it should be possible to do better than what's happening?
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs