https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101639
--- Comment #16 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #14)
> (In reply to Hongtao Liu from comment #13)
> > >
> > > For XOR cstorem4 isn't of help, but if we can get a scalar bit mask we
> > > can use popcount&1 here. Targets with separate vector modes for masks
> > > can use reduc_{and,ior,xor}_scal but on x86 with either integer vector
> > > modes
> > > or integer scalar modes that's going to be difficult. A more explicit
> > > reduc_mask_{and,ior,xor}_scal would be better there.
> >
> > Yes, indeed, x86 can use vpmovmskb/kmov to convert vector mask to scalar and
> > then popcnt&1, those implementation can all be done in the backend
> > expander.
>
> But ouch, for two and four bit masks we have all QImode, so
> reduc_mask_and_scal_qi doesn't work for them. For IOR and XOR it should
we have vec_pack_sbool_trunc_qi and vec_pack_trunc_qi to handle similar issue.
vec_pack_sbool_trunc_qi accepts an additional const int operand to indicate the
number of elements of output vector.