https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101639

--- Comment #27 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---

> > 
> > seems it's tricky to mate a != 0 compare with the all-zero vptest optimally,
> It could be possibly handled in combine, we already has ptest for CCZ and
> CCC separately, if only CCZ is cared, then (unspec:CCZ (eq (eq op const0)
> const0) unspec_ptest) can be simplified.

for reduc_mask_ior, it can be further optimized to below under avx2.


        .cfi_startproc
        vmovdqu (%rdi), %ymm0
        vptest  %ymm0, %ymm0
        setne   %al
        vzeroupper

But for reduc_mask_and, it's


        .cfi_startproc
        vpxor   %xmm1, %xmm1, %xmm1
        vpcmpeqd        (%rdi), %ymm1, %ymm0
        vpcmpeqd        %ymm1, %ymm0, %ymm0
        vpcmpeqd        %ymm1, %ymm1, %ymm1
        vpxor   %ymm1, %ymm0, %ymm0
        vptest  %ymm0, %ymm0
        sete    %al



vs clang

        vpxor   xmm0, xmm0, xmm0
        vpcmpeqd        ymm0, ymm0, ymmword ptr [rdi]
        vmovmskps       eax, ymm0
        test    eax, eax
        sete    al

hard to fix it in the combine.

Reply via email to