https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101639
--- Comment #28 from Hongtao Liu <liuhongt at gcc dot gnu.org> --- (In reply to Hongtao Liu from comment #27) > > > > > > seems it's tricky to mate a != 0 compare with the all-zero vptest > > > optimally, > > It could be possibly handled in combine, we already has ptest for CCZ and > > CCC separately, if only CCZ is cared, then (unspec:CCZ (eq (eq op const0) > > const0) unspec_ptest) can be simplified. > > for reduc_mask_ior, it can be further optimized to below under avx2. > > > .cfi_startproc > vmovdqu (%rdi), %ymm0 > vptest %ymm0, %ymm0 > setne %al > vzeroupper > > But for reduc_mask_and, it's > > > .cfi_startproc > vpxor %xmm1, %xmm1, %xmm1 > vpcmpeqd (%rdi), %ymm1, %ymm0 > vpcmpeqd %ymm1, %ymm0, %ymm0 > vpcmpeqd %ymm1, %ymm1, %ymm1 > vpxor %ymm1, %ymm0, %ymm0 > vptest %ymm0, %ymm0 > sete %al > > > > vs clang > > vpxor xmm0, xmm0, xmm0 > vpcmpeqd ymm0, ymm0, ymmword ptr [rdi] > vmovmskps eax, ymm0 > test eax, eax > sete al > > hard to fix it in the combine wait, we can use (eq op 0) instead of (xor op constm1) to check if op is allones, then it should be handled.
