https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93172

            Bug ID: 93172
           Summary: with AVX512 masked mov assigning zero can use {z}
           Product: gcc
           Version: 10.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: kretz at kde dot org
  Target Milestone: ---
            Target: x86_64-*-*, i?86-*-*

Testcase (cf. https://godbolt.org/z/DMQf9-):

#include <x86intrin.h>

// missed optimization:
__m512 f(__m512 x, __mmask16 k) {
    return _mm512_mask_mov_ps(x, _knot_mask16(k), __m512());
}

// f should be translated like this:
__m512 g(__m512 x, __mmask16 k) {
    return _mm512_maskz_mov_ps(k, x);
}

GCC translates f to:

  vxorps xmm1, xmm1, xmm1
  kmovw k1, edi
  vmovaps zmm0{k1}, zmm1

. It could use:

  kmovd k0, edi
  knotw k1, k0
  vmovaps zmm0 {k1} {z}, zmm0

like g does. I.e. whenever a constant zero is assigned under a negated
write-mask, the {z} variant of vmovaps should be used.

Clang even uses {z} for `_mm512_mask_mov_ps(x, k, __m512())` (i.e. without
negation of the mask), which is unclear whether that's actually a
pessimization: https://godbolt.org/z/Nn4qXz

Reply via email to