https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101639

--- Comment #28 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
(In reply to Hongtao Liu from comment #27)
> > > 
> > > seems it's tricky to mate a != 0 compare with the all-zero vptest 
> > > optimally,
> > It could be possibly handled in combine, we already has ptest for CCZ and
> > CCC separately, if only CCZ is cared, then (unspec:CCZ (eq (eq op const0)
> > const0) unspec_ptest) can be simplified.
> 
> for reduc_mask_ior, it can be further optimized to below under avx2.
> 
> 
>         .cfi_startproc
>         vmovdqu (%rdi), %ymm0
>         vptest  %ymm0, %ymm0
>         setne   %al
>         vzeroupper
> 
> But for reduc_mask_and, it's
> 
> 
>         .cfi_startproc
>         vpxor   %xmm1, %xmm1, %xmm1
>         vpcmpeqd        (%rdi), %ymm1, %ymm0
>         vpcmpeqd        %ymm1, %ymm0, %ymm0
>         vpcmpeqd        %ymm1, %ymm1, %ymm1
>         vpxor   %ymm1, %ymm0, %ymm0
>         vptest  %ymm0, %ymm0
>         sete    %al
> 
> 
> 
> vs clang
> 
>         vpxor   xmm0, xmm0, xmm0
>         vpcmpeqd        ymm0, ymm0, ymmword ptr [rdi]
>         vmovmskps       eax, ymm0
>         test    eax, eax
>         sete    al
> 
> hard to fix it in the combine
wait, we can use (eq op 0) instead of (xor op constm1) to check if op is
allones, then it should be handled.

Reply via email to