[llvm-bugs] [Bug 183683] [X86] `broadcast16(movemask8(x))` needlessly duplicates the mask as a scalar

LLVM Bugs via llvm-bugs Thu, 26 Feb 2026 20:39:34 -0800

Issue	183683
Summary	[X86] `broadcast16(movemask8(x))` needlessly duplicates the mask as a scalar
Labels	new issue
Assignees
Reporter	WalterKruger

    Vector broadcasts are typically implement by moving the scalar value into the lower part of a vector, which is then shuffled until it occupies all lanes (or through dedicated instructions). However if that scalar value was obtained specifically through a 8-bit movemask from a 128-bit vector, it is instead duplicated across a 64-bit scalar value before then being broadcast inside the vector:


```asm
msbFill_SSE2:
        pmovmskb        eax, xmm0
 mov     ecx, eax
        shl     ecx, 16
        or      ecx, eax
 mov     rax, rcx
        shl     rax, 32
        or      rax, rcx
 movq    xmm0, rax
        pshufd  xmm0, xmm0, 68
 ret
```

https://godbolt.org/z/e8jd9sfjW

Note that `pshufd` is easily capable of broadcasting a 32-bit element. Also the case when AVX2 & AVX512-BW is available, which have dedicated 16-bit broadcast instructions but use their 64-bit variants instead. This doesn't happen at other movemask sizes, or if the source vector is wider (even if cast to 128-bits before the movemask!).

```llvm
define <8 x i16> @msbFill(<16 x i8> %x) {
entry:
 %0 = icmp slt <16 x i8> %x, zeroinitializer
  %1 = bitcast <16 x i1> %0 to i16
  %vecinit.i = insertelement <8 x i16> poison, i16 %1, i64 0
 %vecinit7.i = shufflevector <8 x i16> %vecinit.i, <8 x i16> poison, <8 x i32> zeroinitializer
  ret <8 x i16> %vecinit7.i
}
```

_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

[llvm-bugs] [Bug 183683] [X86] `broadcast16(movemask8(x))` needlessly duplicates the mask as a scalar

Reply via email to