https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101639
--- Comment #24 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Hongtao Liu from comment #22)
> (In reply to Richard Biener from comment #21)
> > (In reply to Hongtao Liu from comment #20)
> > > (In reply to Hongtao Liu from comment #19)
> > > > Created attachment 62562 [details]
> > > > avx512/avx2 reduc_mask_{and,ixor,xor}_m
> > > >
> > > > I didn't support V*HImode for reduc_mask_xor_m since x86 only has
> > > > vmovmskps/pd and vpmovmskb. for others, unit test looks ok and I'm
> > > > going to
> > > > have more test for that.
> > >
> > > It failed bootstrap in stage3 with --with-arch=native on SPR, need to
> > > take a
> > > look.
> >
> > It might very well be a bug on the vectorizer side of course.
>
> should be related to reduc_mask_and, the mask needs to be compared to
> allones(-1), no zero since any zero bit will cause the result to be zero.
Yes, I also see the new gcc.dg/vect/vect-reduc-bool-1.c fail execution with
AVX2. For and AND reduction of 16 char elements we create
vpxor %xmm1, %xmm1, %xmm1
vpcmpeqb (%rdi), %xmm1, %xmm0
vpcmpeqb %xmm1, %xmm0, %xmm0
vptest %xmm0, %xmm0
sete %al
clang produces
vpxor %xmm0, %xmm0, %xmm0
vpcmpeqb (%rdi), %xmm0, %xmm0
vpmovmskb %xmm0, %eax
testl %eax, %eax
sete %al
seems it's tricky to mate a != 0 compare with the all-zero vptest optimally,
but GCCs sequence above is at least wrong.