| Issue |
183683
|
| Summary |
[X86] `broadcast16(movemask8(x))` needlessly duplicates the mask as a scalar
|
| Labels |
new issue
|
| Assignees |
|
| Reporter |
WalterKruger
|
Vector broadcasts are typically implement by moving the scalar value into the lower part of a vector, which is then shuffled until it occupies all lanes (or through dedicated instructions). However if that scalar value was obtained specifically through a 8-bit movemask from a 128-bit vector, it is instead duplicated across a 64-bit scalar value before then being broadcast inside the vector:
```asm
msbFill_SSE2:
pmovmskb eax, xmm0
mov ecx, eax
shl ecx, 16
or ecx, eax
mov rax, rcx
shl rax, 32
or rax, rcx
movq xmm0, rax
pshufd xmm0, xmm0, 68
ret
```
https://godbolt.org/z/e8jd9sfjW
Note that `pshufd` is easily capable of broadcasting a 32-bit element. Also the case when AVX2 & AVX512-BW is available, which have dedicated 16-bit broadcast instructions but use their 64-bit variants instead. This doesn't happen at other movemask sizes, or if the source vector is wider (even if cast to 128-bits before the movemask!).
```llvm
define <8 x i16> @msbFill(<16 x i8> %x) {
entry:
%0 = icmp slt <16 x i8> %x, zeroinitializer
%1 = bitcast <16 x i1> %0 to i16
%vecinit.i = insertelement <8 x i16> poison, i16 %1, i64 0
%vecinit7.i = shufflevector <8 x i16> %vecinit.i, <8 x i16> poison, <8 x i32> zeroinitializer
ret <8 x i16> %vecinit7.i
}
```
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs