On Fri, Mar 22, 2019 at 11:40 AM Jakub Jelinek <ja...@redhat.com> wrote:
>
> On Fri, Mar 22, 2019 at 11:11:58AM +0100, Uros Bizjak wrote:
> > > For FMA, naturally only the two operands that are multiplied should be
> > > commutative, but in most patterns one of those two uses "0" or "0,0"
> >
> > This should be safe, we have had "*add<mode>_1" for decades that does
> > just the above.
>
> Sure, the 0 isn't a problem in itself.
>
> > > constraint and there is one or two match_dup 1 for it, so it really
> > > isn't commutative.
> >
> > Hm, this situation involving match_dup needs some more thinking...
>
> But this one is.
> If one reads the documentation
> https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_mask_fmadd_sd&expand=5236,2545
> or
> https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_maskz_fmadd_sd&expand=5236,2545,2547
> then even in the source form related description the a and b arguments
> aren't commutative, because a is used 3 or 2 times while b is used just
> once.
>
> Compare to
> https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_mask3_fmadd_sd&expand=5236,2545,2547,2546
> where a and b are both used just once and which thus can be commutative,
> only c is used 3 times.
>
> For this reason, even _mm{,256,512}_mask_f{,n}m{add,sub}_p{s,d} aren't using 
> % and
> IMHO can't.
> Now, _mm{,256,512}_maskz_f{,n}m{add,sub}_p{s,d} actually do use % and can,
> because both a and b are used just once, it is filled with zeros if mask bit
> is 0.  That is different from _mm_maskz_f{,n}m{add,sub}_s{s,d} which
> fills with 0 only the first element and the elements above it are copied
> from a.
>
> > > Which leaves us with the 4 mask3 patterns only, as I said above, for
> > > the first two where neither of those are negated I think % should be ok.
> > > For the .*fnm{add,sub}.*mask3.* ones I'm not sure, because one of them
> > > is negated.  On the other side, seems various other existing fnm*
> > > patterns use % even on those.
> >
> > It is safe to use even if one of the first two operands is negated.
> > According to the documentation, the negation represents negation of
> > the intermediate product, so it doesn't matter which operand is
> > negated.
>
> So like this if it passes bootstrap/regtest?

Yes.

> PS., it would be nice to have some testsuite coverage also for cases where
> the intrinsics are called from noipa wrapper functions and one of the
> arguments is __m128{,d} *x and passes *x to the intrinsic (repeated so that
> we test all non-mask arguments that way).  And it would be nice to also have
> testcases with constant -1 masks and with constant 0 mask.  I just compiled
> attached sources and eyeballed the result that at least on some cases it
> performed the expected simplifications, but probably don't have spare cycles
> now to turn that all into something suitable for the testsuite (ideally it
> would be both test for specific instructions and runtime test).
>
> 2019-03-22  Jakub Jelinek  <ja...@redhat.com>
>
>         * config/i386/sse.md (<avx512>_fmadd_<mode>_mask3<round_name>,
>         <avx512>_fmsub_<mode>_mask3<round_name>,
>         <avx512>_fnmadd_<mode>_mask3<round_name>,
>         <avx512>_fnmsub_<mode>_mask3<round_name>,
>         avx512f_vmfmadd_<mode>_mask3<round_name>,
>         avx512f_vmfmsub_<mode>_mask3<round_name>,
>         *avx512f_vmfnmadd_<mode>_mask3<round_name>): Use 
> <round_nimm_predicate>
>         instead of register_operand and %v instead of v for match_operand 1.
>         (avx512f_vmfnmsub_<mode>_mask3<round_name>): Rename to ...
>         (*avx512f_vmfnmsub_<mode>_mask3<round_name>): ... this.  Use
>         <round_nimm_predicate> instead of register_operand and %v instead of v
>         for match_operand 1.

OK.

Thanks,
Uros.

> --- gcc/config/i386/sse.md.jj   2019-03-22 11:11:58.330060594 +0100
> +++ gcc/config/i386/sse.md      2019-03-22 11:21:12.901952453 +0100
> @@ -3973,7 +3973,7 @@ (define_insn "<avx512>_fmadd_<mode>_mask
>    [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v")
>         (vec_merge:VF_AVX512VL
>           (fma:VF_AVX512VL
> -           (match_operand:VF_AVX512VL 1 "register_operand" "v")
> +           (match_operand:VF_AVX512VL 1 "<round_nimm_predicate>" "%v")
>             (match_operand:VF_AVX512VL 2 "<round_nimm_predicate>" 
> "<round_constraint>")
>             (match_operand:VF_AVX512VL 3 "register_operand" "0"))
>           (match_dup 3)
> @@ -4094,7 +4094,7 @@ (define_insn "<avx512>_fmsub_<mode>_mask
>    [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v")
>         (vec_merge:VF_AVX512VL
>           (fma:VF_AVX512VL
> -           (match_operand:VF_AVX512VL 1 "register_operand" "v")
> +           (match_operand:VF_AVX512VL 1 "<round_nimm_predicate>" "%v")
>             (match_operand:VF_AVX512VL 2 "<round_nimm_predicate>" 
> "<round_constraint>")
>             (neg:VF_AVX512VL
>               (match_operand:VF_AVX512VL 3 "register_operand" "0")))
> @@ -4217,7 +4217,7 @@ (define_insn "<avx512>_fnmadd_<mode>_mas
>         (vec_merge:VF_AVX512VL
>           (fma:VF_AVX512VL
>             (neg:VF_AVX512VL
> -             (match_operand:VF_AVX512VL 1 "register_operand" "v"))
> +             (match_operand:VF_AVX512VL 1 "<round_nimm_predicate>" "%v"))
>             (match_operand:VF_AVX512VL 2 "<round_nimm_predicate>" 
> "<round_constraint>")
>             (match_operand:VF_AVX512VL 3 "register_operand" "0"))
>           (match_dup 3)
> @@ -4345,7 +4345,7 @@ (define_insn "<avx512>_fnmsub_<mode>_mas
>         (vec_merge:VF_AVX512VL
>           (fma:VF_AVX512VL
>             (neg:VF_AVX512VL
> -             (match_operand:VF_AVX512VL 1 "register_operand" "v"))
> +             (match_operand:VF_AVX512VL 1 "<round_nimm_predicate>" "%v"))
>             (match_operand:VF_AVX512VL 2 "<round_nimm_predicate>" 
> "<round_constraint>")
>             (neg:VF_AVX512VL
>               (match_operand:VF_AVX512VL 3 "register_operand" "0")))
> @@ -4667,7 +4667,7 @@ (define_insn "avx512f_vmfmadd_<mode>_mas
>         (vec_merge:VF_128
>           (vec_merge:VF_128
>             (fma:VF_128
> -             (match_operand:VF_128 1 "register_operand" "v")
> +             (match_operand:VF_128 1 "<round_nimm_predicate>" "%v")
>               (match_operand:VF_128 2 "<round_nimm_predicate>" 
> "<round_constraint>")
>               (match_operand:VF_128 3 "register_operand" "0"))
>             (match_dup 3)
> @@ -4737,7 +4737,7 @@ (define_insn "avx512f_vmfmsub_<mode>_mas
>         (vec_merge:VF_128
>           (vec_merge:VF_128
>             (fma:VF_128
> -             (match_operand:VF_128 1 "register_operand" "v")
> +             (match_operand:VF_128 1 "<round_nimm_predicate>" "%v")
>               (match_operand:VF_128 2 "<round_nimm_predicate>" 
> "<round_constraint>")
>               (neg:VF_128
>                 (match_operand:VF_128 3 "register_operand" "0")))
> @@ -4797,7 +4797,7 @@ (define_insn "*avx512f_vmfnmadd_<mode>_m
>             (fma:VF_128
>               (neg:VF_128
>                 (match_operand:VF_128 2 "<round_nimm_predicate>" 
> "<round_constraint>"))
> -             (match_operand:VF_128 1 "register_operand" "v")
> +             (match_operand:VF_128 1 "<round_nimm_predicate>" "%v")
>               (match_operand:VF_128 3 "register_operand" "0"))
>             (match_dup 3)
>             (match_operand:QI 4 "register_operand" "Yk"))
> @@ -4849,14 +4849,14 @@ (define_insn "*avx512f_vmfnmsub_<mode>_m
>    [(set_attr "type" "ssemuladd")
>     (set_attr "mode" "<MODE>")])
>
> -(define_insn "avx512f_vmfnmsub_<mode>_mask3<round_name>"
> +(define_insn "*avx512f_vmfnmsub_<mode>_mask3<round_name>"
>    [(set (match_operand:VF_128 0 "register_operand" "=v")
>         (vec_merge:VF_128
>           (vec_merge:VF_128
>             (fma:VF_128
>               (neg:VF_128
>                 (match_operand:VF_128 2 "<round_nimm_predicate>" 
> "<round_constraint>"))
> -             (match_operand:VF_128 1 "register_operand" "v")
> +             (match_operand:VF_128 1 "<round_nimm_predicate>" "%v")
>               (neg:VF_128
>                 (match_operand:VF_128 3 "register_operand" "0")))
>             (match_dup 3)
>
>
>         Jakub

Reply via email to