On Fri, Mar 22, 2019 at 11:40 AM Jakub Jelinek <ja...@redhat.com> wrote: > > On Fri, Mar 22, 2019 at 11:11:58AM +0100, Uros Bizjak wrote: > > > For FMA, naturally only the two operands that are multiplied should be > > > commutative, but in most patterns one of those two uses "0" or "0,0" > > > > This should be safe, we have had "*add<mode>_1" for decades that does > > just the above. > > Sure, the 0 isn't a problem in itself. > > > > constraint and there is one or two match_dup 1 for it, so it really > > > isn't commutative. > > > > Hm, this situation involving match_dup needs some more thinking... > > But this one is. > If one reads the documentation > https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_mask_fmadd_sd&expand=5236,2545 > or > https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_maskz_fmadd_sd&expand=5236,2545,2547 > then even in the source form related description the a and b arguments > aren't commutative, because a is used 3 or 2 times while b is used just > once. > > Compare to > https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_mask3_fmadd_sd&expand=5236,2545,2547,2546 > where a and b are both used just once and which thus can be commutative, > only c is used 3 times. > > For this reason, even _mm{,256,512}_mask_f{,n}m{add,sub}_p{s,d} aren't using > % and > IMHO can't. > Now, _mm{,256,512}_maskz_f{,n}m{add,sub}_p{s,d} actually do use % and can, > because both a and b are used just once, it is filled with zeros if mask bit > is 0. That is different from _mm_maskz_f{,n}m{add,sub}_s{s,d} which > fills with 0 only the first element and the elements above it are copied > from a. > > > > Which leaves us with the 4 mask3 patterns only, as I said above, for > > > the first two where neither of those are negated I think % should be ok. > > > For the .*fnm{add,sub}.*mask3.* ones I'm not sure, because one of them > > > is negated. On the other side, seems various other existing fnm* > > > patterns use % even on those. > > > > It is safe to use even if one of the first two operands is negated. > > According to the documentation, the negation represents negation of > > the intermediate product, so it doesn't matter which operand is > > negated. > > So like this if it passes bootstrap/regtest?
Yes. > PS., it would be nice to have some testsuite coverage also for cases where > the intrinsics are called from noipa wrapper functions and one of the > arguments is __m128{,d} *x and passes *x to the intrinsic (repeated so that > we test all non-mask arguments that way). And it would be nice to also have > testcases with constant -1 masks and with constant 0 mask. I just compiled > attached sources and eyeballed the result that at least on some cases it > performed the expected simplifications, but probably don't have spare cycles > now to turn that all into something suitable for the testsuite (ideally it > would be both test for specific instructions and runtime test). > > 2019-03-22 Jakub Jelinek <ja...@redhat.com> > > * config/i386/sse.md (<avx512>_fmadd_<mode>_mask3<round_name>, > <avx512>_fmsub_<mode>_mask3<round_name>, > <avx512>_fnmadd_<mode>_mask3<round_name>, > <avx512>_fnmsub_<mode>_mask3<round_name>, > avx512f_vmfmadd_<mode>_mask3<round_name>, > avx512f_vmfmsub_<mode>_mask3<round_name>, > *avx512f_vmfnmadd_<mode>_mask3<round_name>): Use > <round_nimm_predicate> > instead of register_operand and %v instead of v for match_operand 1. > (avx512f_vmfnmsub_<mode>_mask3<round_name>): Rename to ... > (*avx512f_vmfnmsub_<mode>_mask3<round_name>): ... this. Use > <round_nimm_predicate> instead of register_operand and %v instead of v > for match_operand 1. OK. Thanks, Uros. > --- gcc/config/i386/sse.md.jj 2019-03-22 11:11:58.330060594 +0100 > +++ gcc/config/i386/sse.md 2019-03-22 11:21:12.901952453 +0100 > @@ -3973,7 +3973,7 @@ (define_insn "<avx512>_fmadd_<mode>_mask > [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v") > (vec_merge:VF_AVX512VL > (fma:VF_AVX512VL > - (match_operand:VF_AVX512VL 1 "register_operand" "v") > + (match_operand:VF_AVX512VL 1 "<round_nimm_predicate>" "%v") > (match_operand:VF_AVX512VL 2 "<round_nimm_predicate>" > "<round_constraint>") > (match_operand:VF_AVX512VL 3 "register_operand" "0")) > (match_dup 3) > @@ -4094,7 +4094,7 @@ (define_insn "<avx512>_fmsub_<mode>_mask > [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v") > (vec_merge:VF_AVX512VL > (fma:VF_AVX512VL > - (match_operand:VF_AVX512VL 1 "register_operand" "v") > + (match_operand:VF_AVX512VL 1 "<round_nimm_predicate>" "%v") > (match_operand:VF_AVX512VL 2 "<round_nimm_predicate>" > "<round_constraint>") > (neg:VF_AVX512VL > (match_operand:VF_AVX512VL 3 "register_operand" "0"))) > @@ -4217,7 +4217,7 @@ (define_insn "<avx512>_fnmadd_<mode>_mas > (vec_merge:VF_AVX512VL > (fma:VF_AVX512VL > (neg:VF_AVX512VL > - (match_operand:VF_AVX512VL 1 "register_operand" "v")) > + (match_operand:VF_AVX512VL 1 "<round_nimm_predicate>" "%v")) > (match_operand:VF_AVX512VL 2 "<round_nimm_predicate>" > "<round_constraint>") > (match_operand:VF_AVX512VL 3 "register_operand" "0")) > (match_dup 3) > @@ -4345,7 +4345,7 @@ (define_insn "<avx512>_fnmsub_<mode>_mas > (vec_merge:VF_AVX512VL > (fma:VF_AVX512VL > (neg:VF_AVX512VL > - (match_operand:VF_AVX512VL 1 "register_operand" "v")) > + (match_operand:VF_AVX512VL 1 "<round_nimm_predicate>" "%v")) > (match_operand:VF_AVX512VL 2 "<round_nimm_predicate>" > "<round_constraint>") > (neg:VF_AVX512VL > (match_operand:VF_AVX512VL 3 "register_operand" "0"))) > @@ -4667,7 +4667,7 @@ (define_insn "avx512f_vmfmadd_<mode>_mas > (vec_merge:VF_128 > (vec_merge:VF_128 > (fma:VF_128 > - (match_operand:VF_128 1 "register_operand" "v") > + (match_operand:VF_128 1 "<round_nimm_predicate>" "%v") > (match_operand:VF_128 2 "<round_nimm_predicate>" > "<round_constraint>") > (match_operand:VF_128 3 "register_operand" "0")) > (match_dup 3) > @@ -4737,7 +4737,7 @@ (define_insn "avx512f_vmfmsub_<mode>_mas > (vec_merge:VF_128 > (vec_merge:VF_128 > (fma:VF_128 > - (match_operand:VF_128 1 "register_operand" "v") > + (match_operand:VF_128 1 "<round_nimm_predicate>" "%v") > (match_operand:VF_128 2 "<round_nimm_predicate>" > "<round_constraint>") > (neg:VF_128 > (match_operand:VF_128 3 "register_operand" "0"))) > @@ -4797,7 +4797,7 @@ (define_insn "*avx512f_vmfnmadd_<mode>_m > (fma:VF_128 > (neg:VF_128 > (match_operand:VF_128 2 "<round_nimm_predicate>" > "<round_constraint>")) > - (match_operand:VF_128 1 "register_operand" "v") > + (match_operand:VF_128 1 "<round_nimm_predicate>" "%v") > (match_operand:VF_128 3 "register_operand" "0")) > (match_dup 3) > (match_operand:QI 4 "register_operand" "Yk")) > @@ -4849,14 +4849,14 @@ (define_insn "*avx512f_vmfnmsub_<mode>_m > [(set_attr "type" "ssemuladd") > (set_attr "mode" "<MODE>")]) > > -(define_insn "avx512f_vmfnmsub_<mode>_mask3<round_name>" > +(define_insn "*avx512f_vmfnmsub_<mode>_mask3<round_name>" > [(set (match_operand:VF_128 0 "register_operand" "=v") > (vec_merge:VF_128 > (vec_merge:VF_128 > (fma:VF_128 > (neg:VF_128 > (match_operand:VF_128 2 "<round_nimm_predicate>" > "<round_constraint>")) > - (match_operand:VF_128 1 "register_operand" "v") > + (match_operand:VF_128 1 "<round_nimm_predicate>" "%v") > (neg:VF_128 > (match_operand:VF_128 3 "register_operand" "0"))) > (match_dup 3) > > > Jakub