On Thu, May 28, 2020 at 8:00 AM Richard Sandiford
<richard.sandif...@arm.com> wrote:
>
> "Yangfei (Felix)" <felix.y...@huawei.com> writes:
> > Thanks for reviewing this.
> > Attached please find the v5 patch.
> > Note: we also need to modify local variable "mode" once we catch one case.  
> > I see test failure without this change.
>
> Looks good.  Patch is OK assuming the x86 folks don't want to rewrite
> gcc.target/i386/pr67609.c to avoid the new optimisation.  I'll hold off
> applying until the AVX512 thing is sorted.
>
> > Bootstrapped and tested on aarch64-linux-gnu.
> > Also bootstrapped on x86_64-linux-gnu.  Regression test show two failed 
> > tests on this platform:
>
> Thanks for the extra testing.
>
> > 1> FAIL: gcc.target/i386/avx512f-vcvtps2ph-2.c (test for excess errors)
> > 2> FAIL: gcc.target/i386/pr67609.c scan-assembler movdqa
> >
> > I have adjust the second one in the v4 patch.
>
> So this is:
>
>         movdqa  reg(%rip), %xmm1
>         movaps  %xmm1, -24(%rsp)
>         movsd   %xmm0, -24(%rsp)
>         movapd  -24(%rsp), %xmm2
>         movaps  %xmm2, reg(%rip)
>         ret
>
> to:
>
>         movq    %xmm0, reg(%rip)
>         ret
>
> Nice.  I think it's safe to say that's an improvement :-)
>
> I don't know whether this means we're no longer testing what the test
> was intended to test.  Maybe one of the x86 folks has an opinion about
> whether we should instead rewrite the test somehow.
>
> > But The first one looks strange to me.
> > I see gcc emits invalid x86 vcvtps2ph instrunctions which looks like:
> >
> > 125         vcvtps2ph       $0, %zmm0, -112(%rbp){%k1}
> > 126         vcvtps2ph       $0, %zmm0, -80(%rbp){%k1}{z}
> >
> > This happens in the combine phase, where an combined insn looks like:
> >
> > 1989 Trying 31 -> 33:
> > 1990    31: r106:V16HI=vec_merge(unspec[r103:V16SF,0] 
> > 133,[frame:DI-0x60],r109:HI)
> > 1991    33: [frame:DI-0x60]=r106:V16HI
> > 1992       REG_DEAD r106:V16HI
> > 1993 Successfully matched this instruction:
> > 1994 (set (mem/j/c:V16HI (plus:DI (reg/f:DI 19 frame)
> > 1995             (const_int -96 [0xffffffffffffffa0])) [2 res2.x+0 S32 
> > A256])
> > 1996     (vec_merge:V16HI (unspec:V16HI [
> > 1997                 (reg:V16SF 103)
> > 1998                 (const_int 0 [0])
> > 1999             ] UNSPEC_VCVTPS2PH)
> > 2000         (mem/j/c:V16HI (plus:DI (reg/f:DI 19 frame)
> > 2001                 (const_int -96 [0xffffffffffffffa0])) [2 res2.x+0 S32 
> > A256])
> > 2002         (reg:HI 109)))
> > 2003 allowing combination of insns 31 and 33
> > 2004 original costs 16 + 4 = 20
> > 2005 replacement cost 16
> > 2006 deferring deletion of insn with uid = 31.
> > 2007 modifying insn i3    33: 
> > [frame:DI-0x60]=vec_merge(unspec[r103:V16SF,0] 133,[frame:DI-0x60],r109:HI)
> > 2008 deferring rescan insn with uid = 33.
> >
> > And this can be matched with pattern: avx512f_vcvtps2ph512_mask
> > 2282 (insn 33 31 37 4 (set (mem/j/c:V16HI (plus:DI (reg/f:DI 19 frame)
> > 2283                 (const_int -96 [0xffffffffffffffa0])) [2 res2.x+0 S32 
> > A256])
> > 2284         (vec_merge:V16HI (unspec:V16HI [
> > 2285                     (reg:V16SF 103)
> > 2286                     (const_int 0 [0])
> > 2287                 ] UNSPEC_VCVTPS2PH)
> > 2288             (mem/j/c:V16HI (plus:DI (reg/f:DI 19 frame)
> > 2289                     (const_int -96 [0xffffffffffffffa0])) [2 res2.x+0 
> > S32 A256])
> > 2290             (reg:HI 109))) "avx512f-vcvtps2ph-2.c":80:10 5324 
> > {avx512f_vcvtps2ph512_mask}
> > 2291      (nil))
> >
> > gcc/config/i386/sse.md:
> > 21663 (define_insn "<mask_codefor>avx512f_vcvtps2ph512<mask_name>"
> > 21664   [(set (match_operand:V16HI 0 "nonimmediate_operand" "=vm")
> > 21665         (unspec:V16HI
> > 21666           [(match_operand:V16SF 1 "register_operand" "v")
> > 21667            (match_operand:SI 2 "const_0_to_255_operand" "N")]
> > 21668           UNSPEC_VCVTPS2PH))]
> > 21669   "TARGET_AVX512F"
> > 21670   "vcvtps2ph\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
> > 21671   [(set_attr "type" "ssecvt")
> > 21672    (set_attr "prefix" "evex")
> > 21673    (set_attr "mode" "V16SF")])
> >
> > How can that happen?
>
> This is due to define_subst magic.  The generators automatically
> create a vec_merge form of the instruction based on the information
> in the <mode_*> attributes.
>
> AFAICT the rtl above is for the line-125 instruction, which looks ok.
> The problem is the line-126 instruction, since vcvtps2ph doesn't
> AIUI allow zero masking.
>
> The "mask" define_subst allows both zeroing and merging,
> so I guess this means that the pattern should either be using
> a different define_subst, or should be enforcing merging in
> some other way.  Please could one of the x86 devs take a look?
>

Hongtao, can you take a look?

Thanks.


-- 
H.J.

Reply via email to