https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101639

--- Comment #11 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Hongtao Liu from comment #10)
> clang generates
> 
> avx512:
> f(int*, long):
>         vmovdqu xmm0, xmmword ptr [rdi]
>         vptestnmd       k0, xmm0, xmm0
>         kortestb        k0, k0
>         sete    al
>         ret
> 
> avx2:
> f(int*, long):
>         vpxor   xmm0, xmm0, xmm0
>         vpcmpeqd        xmm0, xmm0, xmmword ptr [rdi]
>         vmovmskps       eax, xmm0
>         test    eax, eax
>         sete    al
>         ret
> 
> Maybe GCC can reuse cstorem4 similar as cbranchm4 for those mask.

Yes, I have not tried to implement native vector mask reduction, instead
I'm going via a data bool vector for the epilogue to use tested code.

Reply via email to