https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101639

--- Comment #24 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Hongtao Liu from comment #22)
> (In reply to Richard Biener from comment #21)
> > (In reply to Hongtao Liu from comment #20)
> > > (In reply to Hongtao Liu from comment #19)
> > > > Created attachment 62562 [details]
> > > > avx512/avx2 reduc_mask_{and,ixor,xor}_m
> > > > 
> > > > I didn't support V*HImode for reduc_mask_xor_m since x86 only has
> > > > vmovmskps/pd and vpmovmskb. for others, unit test looks ok and I'm 
> > > > going to
> > > > have more test for that.
> > > 
> > > It failed bootstrap in stage3 with --with-arch=native on SPR, need to 
> > > take a
> > > look.
> > 
> > It might very well be a bug on the vectorizer side of course.
> 
> should be related to reduc_mask_and, the mask needs to be compared to
> allones(-1),  no zero since any zero bit will cause the result to be zero.

Yes, I also see the new gcc.dg/vect/vect-reduc-bool-1.c fail execution with
AVX2.  For and AND reduction of 16 char elements we create

        vpxor   %xmm1, %xmm1, %xmm1
        vpcmpeqb        (%rdi), %xmm1, %xmm0
        vpcmpeqb        %xmm1, %xmm0, %xmm0
        vptest  %xmm0, %xmm0
        sete    %al

clang produces

        vpxor   %xmm0, %xmm0, %xmm0
        vpcmpeqb        (%rdi), %xmm0, %xmm0
        vpmovmskb       %xmm0, %eax
        testl   %eax, %eax
        sete    %al

seems it's tricky to mate a != 0 compare with the all-zero vptest optimally,
but GCCs sequence above is at least wrong.

Reply via email to