https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122818
--- Comment #2 from Matthias Kretz (Vir) <mkretz at gcc dot gnu.org> --- This is due to the use of `fixed_size_simd`. The type implements an additional ABI guarantee, so that it is safe to use over ABI boundaries (e.g. when passing function arguments between TUs compiled with and without AVX512). It therefore implements masks as *bitmasks*. That's why you see the useless conversion. Replace: -using fixed_simd_t = stdx::fixed_size_simd<uint32_t, 8>; +using fixed_simd_t = stdx::simd<uint32_t, stdx::simd_abi::deduce_t<uint32_t, 8>>; With AVX2 you should get the expected code-gen. Wrt. the optimizer, if I had a way to convert vec-mask -> bit-mask -> vec-mask in a way that the compiler knows what I'm doing, I'm sure it would just optimize it away. ;-) FWIW, the C++26 implementation will not have such an "ABI stable" type anymore and std::simd::vec<uint32_t, 8> will behave as you expected.
