https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122818

--- Comment #2 from Matthias Kretz (Vir) <mkretz at gcc dot gnu.org> ---
This is due to the use of `fixed_size_simd`. The type implements an additional
ABI guarantee, so that it is safe to use over ABI boundaries (e.g. when passing
function arguments between TUs compiled with and without AVX512). It therefore
implements masks as *bitmasks*. That's why you see the useless conversion.

Replace:

-using fixed_simd_t  = stdx::fixed_size_simd<uint32_t, 8>;
+using fixed_simd_t  = stdx::simd<uint32_t, stdx::simd_abi::deduce_t<uint32_t,
8>>;

With AVX2 you should get the expected code-gen.

Wrt. the optimizer, if I had a way to convert vec-mask -> bit-mask -> vec-mask
in a way that the compiler knows what I'm doing, I'm sure it would just
optimize it away. ;-)

FWIW, the C++26 implementation will not have such an "ABI stable" type anymore
and std::simd::vec<uint32_t, 8> will behave as you expected.

Reply via email to