https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101639
--- Comment #14 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Hongtao Liu from comment #13)
> >
> > For XOR cstorem4 isn't of help, but if we can get a scalar bit mask we
> > can use popcount&1 here. Targets with separate vector modes for masks
> > can use reduc_{and,ior,xor}_scal but on x86 with either integer vector modes
> > or integer scalar modes that's going to be difficult. A more explicit
> > reduc_mask_{and,ior,xor}_scal would be better there.
>
> Yes, indeed, x86 can use vpmovmskb/kmov to convert vector mask to scalar and
> then popcnt&1, those implementation can all be done in the backend expander.
But ouch, for two and four bit masks we have all QImode, so
reduc_mask_and_scal_qi doesn't work for them. For IOR and XOR it should work
if excess mask bits are reliably zero. Happens for the following with
-mprefer-vector-width=256 as we then get V4DImode data. I'm not sure what to
do here given the machine description works on modes here (and there can be
only one PQImodes, for example). Any idea? We could possibly ensure
zero/sign-extension of integer mode masks from the vectorizer (by means of
shifts/ands).
bool f(long * p, long n)
{
bool r = true;
for(long i = 0; i < 16; ++i)
r &= (p[i] != 0);
return r;
}