https://llvm.org/bugs/show_bug.cgi?id=31933
Bug ID: 31933 Summary: AVX512: LLVM generates poor quality code involving masks Product: libraries Version: trunk Hardware: PC OS: All Status: NEW Severity: normal Priority: P Component: Backend: X86 Assignee: unassignedb...@nondot.org Reporter: wenzel.ja...@epfl.ch CC: llvm-bugs@lists.llvm.org Classification: Unclassified Clang/LLVM (trunk) seems to have issues generating code involving bitwise arithmetic of AVX512-style mask registers with sizes other than 16 bits. Consider the following two versions of a (somewhat contrived) program that computes three masks and then ORs them together: #include <immintrin.h> /* Version 1: use _mm512_kor */ __m512d test1(__m512d a, __m512d b, __m512d c) { __mmask8 m1 = _mm512_cmp_pd_mask(a, b, _CMP_LT_OQ); __mmask8 m2 = _mm512_cmp_pd_mask(b, c, _CMP_LT_OQ); __mmask8 m3 = _mm512_cmp_pd_mask(c, a, _CMP_LT_OQ); __mmask8 m = _mm512_kor(_mm512_kor(m1, m2), m3); return _mm512_mask_blend_pd(m, a, b); } /* Version 2: use operator| */ __m512d test2(__m512d a, __m512d b, __m512d c) { __mmask8 m1 = _mm512_cmp_pd_mask(a, b, _CMP_LT_OQ); __mmask8 m2 = _mm512_cmp_pd_mask(b, c, _CMP_LT_OQ); __mmask8 m3 = _mm512_cmp_pd_mask(c, a, _CMP_LT_OQ); __mmask8 m = m1 | m2 | m3; return _mm512_mask_blend_pd(m, a, b); } Version 1 (with _mm512_kor) generates the following code (uh, oh, lots of e*x <-> k* transitions, move with zero extend, etc.) __Z5test1Dv8_dS_S_: vcmplt_oqpd k0, zmm0, zmm1 kmovw eax, k0 vcmplt_oqpd k0, zmm1, zmm2 kmovw ecx, k0 vcmplt_oqpd k0, zmm2, zmm0 kmovw edx, k0 movzx eax, al movzx ecx, cl kmovw k0, ecx kmovw k1, eax korw k0, k1, k0 movzx eax, dl kmovw k1, eax korw k0, k0, k1 kmovw eax, k0 kmovw k1, eax vblendmpd zmm0 {k1}, zmm0, zmm1 ret Version 2 is better but still not ideal due to the unnecessary transitions between mask registers and regular registers. __Z5test2Dv8_dS_S_: vcmplt_oqpd k0, zmm0, zmm1 kmovw eax, k0 vcmplt_oqpd k0, zmm1, zmm2 kmovw ecx, k0 vcmplt_oqpd k0, zmm2, zmm0 kmovw edx, k0 or cl, al or cl, dl kmovw k1, ecx vblendmpd zmm0 {k1}, zmm0, zmm1 ret What I would have expected to see is this, but it doesn't seem that Clang/LLVM is able to generate it: vcmplt_oqpd k0, zmm0, zmm1 vcmplt_oqpd k1, zmm1, zmm2 vcmplt_oqpd k2, zmm2, zmm0 korw k0, k0, k1 korw k0, k0, k2 vblendmpd zmm0 {k0}, zmm0, zmm1 I think that either version 1 or version 2 (or both) should. It would be great if this could be fixed. -- You are receiving this mail because: You are on the CC list for the bug.
_______________________________________________ llvm-bugs mailing list llvm-bugs@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs