On Wed, Aug 19, 2020 at 4:25 AM Hongtao Liu <crazy...@gmail.com> wrote:
>
> On Mon, Aug 17, 2020 at 6:08 PM Uros Bizjak <ubiz...@gmail.com> wrote:
> >
> > On Fri, Aug 14, 2020 at 10:26 AM Hongtao Liu <crazy...@gmail.com> wrote:
> > >
> > > Enable operator or/xor/and/andn/not for mask register, kxnor is not
> > > enabled since there's no corresponding instruction for general
> > > registers.
> > >
> > > gcc/
> > >         PR target/88808
> > >         * config/i386/i386.md: (*movsi_internal): Adjust constraints
> > >         for mask registers.
> > >         (*movhi_internal): Ditto.
> > >         (*movqi_internal): Ditto.
> > >         (*anddi_1): Support mask register operations
> > >         (*and<mode>_1): Ditto.
> > >         (*andqi_1): Ditto.
> > >         (*andn<mode>_1): Ditto.
> > >         (*<code><mode>_1): Ditto.
> > >         (*<code>qi_1): Ditto.
> > >         (*one_cmpl<mode>2_1): Ditto.
> > >         (*one_cmplsi2_1_zext): Ditto.
> > >         (*one_cmplqi2_1): Ditto.
> > >
> > > gcc/testsuite/
> > >         * gcc.target/i386/bitwise_mask_op-1.c: New test.
> > >         * gcc.target/i386/bitwise_mask_op-2.c: New test.
> > >         * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust testcase.
> > >         * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto.
> > >         * gcc.target/i386/avx512dq-kmovb-5.c: Ditto.
> > >         * gcc.target/i386/avx512f-kmovw-5.c: Ditto.
> >
> > index 74d207c3711..e8ad79d1b0a 100644
> > --- a/gcc/config/i386/i386.md
> > +++ b/gcc/config/i386/i386.md
> > @@ -2294,7 +2294,7 @@
> >
> >  (define_insn "*movsi_internal"
> >    [(set (match_operand:SI 0 "nonimmediate_operand"
> > -    "=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,*k")
> > +    "=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,k")
> >      (match_operand:SI 1 "general_operand"
> >      "g ,re,C ,*y,m  ,*y,*y,r  ,C ,*v,m ,*v,*v,r  ,*r,*km,*k ,CBC"))]
> >    "!(MEM_P (operands[0]) && MEM_P (operands[1]))"
> >
> > I'd rather see *k everywhere, also with *movqi_internal and
> > *movhi_internal patterns. The "*" means that the allocator won't
> > allocate a mask register by default, but it will be used to optimize
> > moves. With the above change, you are risking that during integer
> > register pressure, the register allocator will allocate zero to a mask
> > register, and later "optimize" the move with a direct maskreg-intreg
> > move.
> >
> > The current strategy is that only general registers get allocated for
> > integer modes. Let's keep it this way for now.
> >
>
> Yes,  though it would fail gcc.target/i386/avx512dq-pr88465.c and
> gcc.target/i386/avx512f-pr88465.c, i think it's more reasonable not to
> move zero into mask register directly.

Although it would be nice if the register allocator was smart enough,
the current strategy is to introduce peephole2 patterns to fix these
problems, similar to [1]. These peepholes can be introduced in a
follow-up patch.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551744.html

> > Otherwise, the patchset LGTM, but please test the suggested changes and 
> > repost.
> >
> > BTW: Do you plan to remove mask operations from sse.md? ATM, they are
> > used to distinguish mask operations, generated from builtins from
> > generic operations, so I'd like to keep them for a while. The drawback
> > is, that they are not combined with other operations, but at the end
> > of the day, this is what the programmer asked for by using builtins.
>
> Agree, I prefer to keep them.

Thinking some more about the approach, it looks to me that the optimal
solution is a post-reload splitter that would convert "generic"
patterns to mask operations from sse.md. The mask operations don't set
flags, so we can substantially improve post reload scheduling of these
instructions by removing flags clobber.

So, simply add "#" to relevant alternatives of logic patterns and add
something like:

--cut here--
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 41c6dbfa668..ad49bdc7583 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1470,6 +1470,18 @@
           ]
           (const_string "<MODE>")))])

+(define_split
+  [(set (match_operand:SWI1248_AVX512BW 0 "mask_reg_operand")
+       (any_logic:SWI1248_AVX512BW
+         (match_operand:SWI1248_AVX512BW 1 "mask_reg_operand")
+         (match_operand:SWI1248_AVX512BW 2 "mask_reg_operand")))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_AVX512F && reload_completed"
+  [(parallel
+     [(set (match_dup 0)
+          (any_logic:SWI1248_AVX512BW (match_dup 1) (match_dup 2)))
+      (unspec [(const_int 0)] UNSPEC_MASKOP)])])
+
 (define_insn "kandn<mode>"
   [(set (match_operand:SWI1248_AVX512BW 0 "register_operand" "=k")
        (and:SWI1248_AVX512BW
--cut here--

and similar for kandn and knot in sse.md. You will have to add
mask_reg_operand predicate, see e.g. sse_reg_operand in predicates.md
for example.

We don't lose anything, because all important transformations,
propagations and simplifications with these patterns happen before
reload.

Uros.

Reply via email to