https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122818

Matthias Kretz (Vir) <mkretz at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|---                         |WONTFIX

--- Comment #3 from Matthias Kretz (Vir) <mkretz at gcc dot gnu.org> ---
https://godbolt.org/z/YMb6Y7TMs shows a fairly minimal example of the
fixed-size mask conversions. If the optimizer were able to see through all of
the operations, it would compile to a simple memcpy.

This sequence is the mask no-op:
        vmovmskps       eax, ymm0 // ymm0 is known to be a mask
        movzx   eax, al
        vmovd   xmm0, eax
        vpbroadcastd    ymm0, xmm0
        vpand   ymm0, ymm0, YMMWORD PTR .LC0[rip]
        vpxor   xmm2, xmm2, xmm2
        vpcmpgtd        ymm0, ymm0, ymm2

.LC0:
        .long   1
        .long   2
        .long   4
        .long   8
        .long   16
        .long   32
        .long   64
        .long   128

I think it is out of scope to recognize patterns like this (which is why I
never reported them). We need an abstraction on a higher level.

Reply via email to