On Thu, May 28, 2020 at 8:00 AM Richard Sandiford <richard.sandif...@arm.com> wrote: > > "Yangfei (Felix)" <felix.y...@huawei.com> writes: > > Thanks for reviewing this. > > Attached please find the v5 patch. > > Note: we also need to modify local variable "mode" once we catch one case. > > I see test failure without this change. > > Looks good. Patch is OK assuming the x86 folks don't want to rewrite > gcc.target/i386/pr67609.c to avoid the new optimisation. I'll hold off > applying until the AVX512 thing is sorted. > > > Bootstrapped and tested on aarch64-linux-gnu. > > Also bootstrapped on x86_64-linux-gnu. Regression test show two failed > > tests on this platform: > > Thanks for the extra testing. > > > 1> FAIL: gcc.target/i386/avx512f-vcvtps2ph-2.c (test for excess errors) > > 2> FAIL: gcc.target/i386/pr67609.c scan-assembler movdqa > > > > I have adjust the second one in the v4 patch. > > So this is: > > movdqa reg(%rip), %xmm1 > movaps %xmm1, -24(%rsp) > movsd %xmm0, -24(%rsp) > movapd -24(%rsp), %xmm2 > movaps %xmm2, reg(%rip) > ret > > to: > > movq %xmm0, reg(%rip) > ret > > Nice. I think it's safe to say that's an improvement :-) > > I don't know whether this means we're no longer testing what the test > was intended to test. Maybe one of the x86 folks has an opinion about > whether we should instead rewrite the test somehow. > > > But The first one looks strange to me. > > I see gcc emits invalid x86 vcvtps2ph instrunctions which looks like: > > > > 125 vcvtps2ph $0, %zmm0, -112(%rbp){%k1} > > 126 vcvtps2ph $0, %zmm0, -80(%rbp){%k1}{z} > > > > This happens in the combine phase, where an combined insn looks like: > > > > 1989 Trying 31 -> 33: > > 1990 31: r106:V16HI=vec_merge(unspec[r103:V16SF,0] > > 133,[frame:DI-0x60],r109:HI) > > 1991 33: [frame:DI-0x60]=r106:V16HI > > 1992 REG_DEAD r106:V16HI > > 1993 Successfully matched this instruction: > > 1994 (set (mem/j/c:V16HI (plus:DI (reg/f:DI 19 frame) > > 1995 (const_int -96 [0xffffffffffffffa0])) [2 res2.x+0 S32 > > A256]) > > 1996 (vec_merge:V16HI (unspec:V16HI [ > > 1997 (reg:V16SF 103) > > 1998 (const_int 0 [0]) > > 1999 ] UNSPEC_VCVTPS2PH) > > 2000 (mem/j/c:V16HI (plus:DI (reg/f:DI 19 frame) > > 2001 (const_int -96 [0xffffffffffffffa0])) [2 res2.x+0 S32 > > A256]) > > 2002 (reg:HI 109))) > > 2003 allowing combination of insns 31 and 33 > > 2004 original costs 16 + 4 = 20 > > 2005 replacement cost 16 > > 2006 deferring deletion of insn with uid = 31. > > 2007 modifying insn i3 33: > > [frame:DI-0x60]=vec_merge(unspec[r103:V16SF,0] 133,[frame:DI-0x60],r109:HI) > > 2008 deferring rescan insn with uid = 33. > > > > And this can be matched with pattern: avx512f_vcvtps2ph512_mask > > 2282 (insn 33 31 37 4 (set (mem/j/c:V16HI (plus:DI (reg/f:DI 19 frame) > > 2283 (const_int -96 [0xffffffffffffffa0])) [2 res2.x+0 S32 > > A256]) > > 2284 (vec_merge:V16HI (unspec:V16HI [ > > 2285 (reg:V16SF 103) > > 2286 (const_int 0 [0]) > > 2287 ] UNSPEC_VCVTPS2PH) > > 2288 (mem/j/c:V16HI (plus:DI (reg/f:DI 19 frame) > > 2289 (const_int -96 [0xffffffffffffffa0])) [2 res2.x+0 > > S32 A256]) > > 2290 (reg:HI 109))) "avx512f-vcvtps2ph-2.c":80:10 5324 > > {avx512f_vcvtps2ph512_mask} > > 2291 (nil)) > > > > gcc/config/i386/sse.md: > > 21663 (define_insn "<mask_codefor>avx512f_vcvtps2ph512<mask_name>" > > 21664 [(set (match_operand:V16HI 0 "nonimmediate_operand" "=vm") > > 21665 (unspec:V16HI > > 21666 [(match_operand:V16SF 1 "register_operand" "v") > > 21667 (match_operand:SI 2 "const_0_to_255_operand" "N")] > > 21668 UNSPEC_VCVTPS2PH))] > > 21669 "TARGET_AVX512F" > > 21670 "vcvtps2ph\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}" > > 21671 [(set_attr "type" "ssecvt") > > 21672 (set_attr "prefix" "evex") > > 21673 (set_attr "mode" "V16SF")]) > > > > How can that happen? > > This is due to define_subst magic. The generators automatically > create a vec_merge form of the instruction based on the information > in the <mode_*> attributes. > > AFAICT the rtl above is for the line-125 instruction, which looks ok. > The problem is the line-126 instruction, since vcvtps2ph doesn't > AIUI allow zero masking. > > The "mask" define_subst allows both zeroing and merging, > so I guess this means that the pattern should either be using > a different define_subst, or should be enforcing merging in > some other way. Please could one of the x86 devs take a look? >
Hongtao, can you take a look? Thanks. -- H.J.