"Yangfei (Felix)" <felix.y...@huawei.com> writes:
> Thanks for reviewing this.
> Attached please find the v5 patch.
> Note: we also need to modify local variable "mode" once we catch one case.  I 
> see test failure without this change.

Looks good.  Patch is OK assuming the x86 folks don't want to rewrite
gcc.target/i386/pr67609.c to avoid the new optimisation.  I'll hold off
applying until the AVX512 thing is sorted.

> Bootstrapped and tested on aarch64-linux-gnu.
> Also bootstrapped on x86_64-linux-gnu.  Regression test show two failed tests 
> on this platform:

Thanks for the extra testing.

> 1> FAIL: gcc.target/i386/avx512f-vcvtps2ph-2.c (test for excess errors)
> 2> FAIL: gcc.target/i386/pr67609.c scan-assembler movdqa
>
> I have adjust the second one in the v4 patch.

So this is:

        movdqa  reg(%rip), %xmm1
        movaps  %xmm1, -24(%rsp)
        movsd   %xmm0, -24(%rsp)
        movapd  -24(%rsp), %xmm2
        movaps  %xmm2, reg(%rip)
        ret

to:

        movq    %xmm0, reg(%rip)
        ret

Nice.  I think it's safe to say that's an improvement :-)

I don't know whether this means we're no longer testing what the test
was intended to test.  Maybe one of the x86 folks has an opinion about
whether we should instead rewrite the test somehow.

> But The first one looks strange to me.
> I see gcc emits invalid x86 vcvtps2ph instrunctions which looks like:
>
> 125         vcvtps2ph       $0, %zmm0, -112(%rbp){%k1}
> 126         vcvtps2ph       $0, %zmm0, -80(%rbp){%k1}{z}
>
> This happens in the combine phase, where an combined insn looks like:
>
> 1989 Trying 31 -> 33:
> 1990    31: r106:V16HI=vec_merge(unspec[r103:V16SF,0] 
> 133,[frame:DI-0x60],r109:HI)
> 1991    33: [frame:DI-0x60]=r106:V16HI
> 1992       REG_DEAD r106:V16HI
> 1993 Successfully matched this instruction:
> 1994 (set (mem/j/c:V16HI (plus:DI (reg/f:DI 19 frame)
> 1995             (const_int -96 [0xffffffffffffffa0])) [2 res2.x+0 S32 A256])
> 1996     (vec_merge:V16HI (unspec:V16HI [
> 1997                 (reg:V16SF 103)
> 1998                 (const_int 0 [0])
> 1999             ] UNSPEC_VCVTPS2PH)
> 2000         (mem/j/c:V16HI (plus:DI (reg/f:DI 19 frame)
> 2001                 (const_int -96 [0xffffffffffffffa0])) [2 res2.x+0 S32 
> A256])
> 2002         (reg:HI 109)))
> 2003 allowing combination of insns 31 and 33
> 2004 original costs 16 + 4 = 20
> 2005 replacement cost 16
> 2006 deferring deletion of insn with uid = 31.
> 2007 modifying insn i3    33: [frame:DI-0x60]=vec_merge(unspec[r103:V16SF,0] 
> 133,[frame:DI-0x60],r109:HI)
> 2008 deferring rescan insn with uid = 33.
>
> And this can be matched with pattern: avx512f_vcvtps2ph512_mask
> 2282 (insn 33 31 37 4 (set (mem/j/c:V16HI (plus:DI (reg/f:DI 19 frame)
> 2283                 (const_int -96 [0xffffffffffffffa0])) [2 res2.x+0 S32 
> A256])
> 2284         (vec_merge:V16HI (unspec:V16HI [
> 2285                     (reg:V16SF 103)
> 2286                     (const_int 0 [0])
> 2287                 ] UNSPEC_VCVTPS2PH)
> 2288             (mem/j/c:V16HI (plus:DI (reg/f:DI 19 frame)
> 2289                     (const_int -96 [0xffffffffffffffa0])) [2 res2.x+0 
> S32 A256])
> 2290             (reg:HI 109))) "avx512f-vcvtps2ph-2.c":80:10 5324 
> {avx512f_vcvtps2ph512_mask}
> 2291      (nil))
>
> gcc/config/i386/sse.md:
> 21663 (define_insn "<mask_codefor>avx512f_vcvtps2ph512<mask_name>"
> 21664   [(set (match_operand:V16HI 0 "nonimmediate_operand" "=vm")
> 21665         (unspec:V16HI
> 21666           [(match_operand:V16SF 1 "register_operand" "v")
> 21667            (match_operand:SI 2 "const_0_to_255_operand" "N")]
> 21668           UNSPEC_VCVTPS2PH))]
> 21669   "TARGET_AVX512F"
> 21670   "vcvtps2ph\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
> 21671   [(set_attr "type" "ssecvt")
> 21672    (set_attr "prefix" "evex")
> 21673    (set_attr "mode" "V16SF")])
>
> How can that happen? 

This is due to define_subst magic.  The generators automatically
create a vec_merge form of the instruction based on the information
in the <mode_*> attributes.

AFAICT the rtl above is for the line-125 instruction, which looks ok.
The problem is the line-126 instruction, since vcvtps2ph doesn't
AIUI allow zero masking.

The "mask" define_subst allows both zeroing and merging,
so I guess this means that the pattern should either be using
a different define_subst, or should be enforcing merging in
some other way.  Please could one of the x86 devs take a look?

Thanks,
Richard

Reply via email to