On Tue, Nov 24, 2020 at 4:27 AM Jeff Law <l...@redhat.com> wrote: > > > > On 11/4/20 2:19 AM, Hongtao Liu via Gcc-patches wrote: > > Hi: > > When programmers explicitly use mask loaded intrinsics, don't > > transform the instruction to vpblend{b,w,d,q} since If mem_addr points > > to a memory region with less than whole vector size of accessible > > memory, the mask would prevent reading the inaccessible bytes which > > could avoid fault. > > > > Bootstrap is ok, gcc regress test for i386/x86_64 backend is ok. > > Ok for trunk? > > > > gcc/ChangeLog: > > > > PR target/97642 > > * config/i386/sse.md (UNSPEC_MASKLOAD): New unspec. > > (*<avx512>_load<mode>_mask): New define_insns for masked load > > instructions. > > (<avx512>_load<mode>_mask): Changed to define_expands which > > specifically handle memory operands. > > (<avx512>_blendm<mode>): Changed to define_insns which are same > > as original <avx512>_load<mode>_mask with adjustment of > > operands order. > > (*<avx512>_load<mode>): New define_insn_and_split which is > > used to optimize for masked load with all one mask. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/i386/avx512bw-vmovdqu16-1.c: Adjust testcase to > > make sure only masked load instruction is generated. > > * gcc.target/i386/avx512bw-vmovdqu8-1.c: Ditto. > > * gcc.target/i386/avx512f-vmovapd-1.c: Ditto. > > * gcc.target/i386/avx512f-vmovaps-1.c: Ditto. > > * gcc.target/i386/avx512f-vmovdqa32-1.c: Ditto. > > * gcc.target/i386/avx512f-vmovdqa64-1.c: Ditto. > > * gcc.target/i386/avx512vl-vmovapd-1.c: Ditto. > > * gcc.target/i386/avx512vl-vmovaps-1.c: Ditto. > > * gcc.target/i386/avx512vl-vmovdqa32-1.c: Ditto. > > * gcc.target/i386/avx512vl-vmovdqa64-1.c: Ditto. > > * gcc.target/i386/pr97642-1.c: New test. > > * gcc.target/i386/pr97642-2.c: New test. > > > > > > 0001-Fix-incorrect-replacement-of-vmovdqu32-with-vpblendd.patch > > > > From 48cf0adcd55395653891888f4768b8bdc19786f2 Mon Sep 17 00:00:00 2001 > > From: liuhongt <hongtao....@intel.com> > > Date: Tue, 3 Nov 2020 17:26:43 +0800 > > Subject: [PATCH] Fix incorrect replacement of vmovdqu32 with vpblendd which > > can cause fault. > > > > gcc/ChangeLog: > > > > PR target/97642 > > * config/i386/sse.md (UNSPEC_MASKLOAD): New unspec. > > (*<avx512>_load<mode>_mask): New define_insns for masked load > > instructions. > > (<avx512>_load<mode>_mask): Changed to define_expands which > > specifically handle memory operands. > > (<avx512>_blendm<mode>): Changed to define_insns which are same > > as original <avx512>_load<mode>_mask with adjustment of > > operands order. > > (*<avx512>_load<mode>): New define_insn_and_split which is > > used to optimize for masked load with all one mask. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/i386/avx512bw-vmovdqu16-1.c: Adjust testcase to > > make sure only masked load instruction is generated. > > * gcc.target/i386/avx512bw-vmovdqu8-1.c: Ditto. > > * gcc.target/i386/avx512f-vmovapd-1.c: Ditto. > > * gcc.target/i386/avx512f-vmovaps-1.c: Ditto. > > * gcc.target/i386/avx512f-vmovdqa32-1.c: Ditto. > > * gcc.target/i386/avx512f-vmovdqa64-1.c: Ditto. > > * gcc.target/i386/avx512vl-vmovapd-1.c: Ditto. > > * gcc.target/i386/avx512vl-vmovaps-1.c: Ditto. > > * gcc.target/i386/avx512vl-vmovdqa32-1.c: Ditto. > > * gcc.target/i386/avx512vl-vmovdqa64-1.c: Ditto. > > * gcc.target/i386/pr97642-1.c: New test. > > * gcc.target/i386/pr97642-2.c: New test. > So in the BZ Jakub asked for the all-ones mask case to be specially > handled to emit a normal load. I don't see where we're handling that. > ISTM that we'd want a test for that too. Right? >
all-ones mask would be simplified to a simple load but with unspec in set_src and would be handled by the following +(define_insn_and_split "*<avx512>_load<mode>" + [(set (match_operand:V48_AVX512VL 0 "register_operand") + (unspec:V48_AVX512VL + [(match_operand:V48_AVX512VL 1 "memory_operand")] + UNSPEC_MASKLOAD))] + "TARGET_AVX512F" + "#" + "&& 1" + [(set (match_dup 0) (match_dup 1))]) and the corresponding testcase is new file gcc/testsuite/gcc.target/i386/pr97642-1.c @@ -0,0 +1,23 @@ +/* PR target/97642 */ +/* { dg-do compile } */ +/* { dg-options "-mavx512vl -O2" } */ +/* { dg-final { scan-assembler-not { k[0-8] } } } */ + +#include <immintrin.h> +__m128i +foo1 (__m128i src, void const* P) +{ + return _mm_mask_loadu_epi32 (src, 15, P); +} + +__m256i +foo2 (__m256i src, void const* P) +{ + return _mm256_mask_loadu_epi32 (src, 255, P); +} + +__m512i +foo3 (__m512i src, void const* P) +{ + return _mm512_mask_loadu_epi32 (src, 65535 , P); +} > WIth that in place and tested, this is probably ready for the trunk. > > jeff > > -- BR, Hongtao