https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86267
--- Comment #2 from Matthias Kretz <kretz at kde dot org> --- Sorry for the delay. Vacation... This pattern appears in many variations in the implementation of wg21.link/p0214r9. The fixed_size<N> ABI tag used with a simd_mask type requires a decision from the implementer, whether to store the mask unconditionally as a bitmask or as one or more vector masks. (array of bools is another choice, but never a good fit.) Thanks to AVX512, the native mask representation on x86 "depends". Any choice for simd_mask<T, fixed_size<N>> leads to bitmask <-> vector masks conversions. GCC decided to implement compares of vector builtins to unconditionally return vector masks, even if an AVX512 compare instruction is used. The optimizer then sometimes recognizes the conversion back to a bitmask and eliminates the conversions. Consequently, fixed_size simd_masks currently achieve better optimization when implemented as vector masks. Through this PR, I want to find out whether using bitmasks is a feasible solution. I understand the pain involved in making this work generically. That's why I'm suggesting to only support this optimization when a special conversion builtin is used. Thus, GCC wouldn't have to recognize all possible patterns to convert bitmask <-> vector mask. And, through the use of __builtin_vector_to_bitmask the caller implies that the argument is a vector mask (every other input is UB).