https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86267

--- Comment #2 from Matthias Kretz <kretz at kde dot org> ---
Sorry for the delay. Vacation...

This pattern appears in many variations in the implementation of
wg21.link/p0214r9. The fixed_size<N> ABI tag used with a simd_mask type
requires a decision from the implementer, whether to store the mask
unconditionally as a bitmask or as one or more vector masks. (array of bools is
another choice, but never a good fit.)
Thanks to AVX512, the native mask representation on x86 "depends". Any choice
for simd_mask<T, fixed_size<N>> leads to bitmask <-> vector masks conversions.
GCC decided to implement compares of vector builtins to unconditionally return
vector masks, even if an AVX512 compare instruction is used. The optimizer then
sometimes recognizes the conversion back to a bitmask and eliminates the
conversions. Consequently, fixed_size simd_masks currently achieve better
optimization when implemented as vector masks. Through this PR, I want to find
out whether using bitmasks is a feasible solution.

I understand the pain involved in making this work generically. That's why I'm
suggesting to only support this optimization when a special conversion builtin
is used. Thus, GCC wouldn't have to recognize all possible patterns to convert
bitmask <-> vector mask. And, through the use of __builtin_vector_to_bitmask
the caller implies that the argument is a vector mask (every other input is
UB).

Reply via email to