Hi Richard,

Thanks for the reviews, I'm making these changes but just a heads up.

When hardcoding LR_REGNUM like this we need to change the way we compare the 
register in doloop_condition_get. This function currently compares the rtx 
nodes by address, which I think happens to be fine before we assign hard 
registers, as I suspect we always share the rtx node for the same pseudo, but 
when assigning registers it seems like we create copies, so things like:
`XEXP (inc_src, 0) == reg` will fail for
inc_src: (plus (reg LR) (const_int -n)'
reg: (reg LR)

Instead I will substitute the operand '==' with calls to 'rtx_equal_p (op1, 
op2, NULL)'.

Sound good?

Kind regards,
Andre

________________________________________
From: Richard Earnshaw (lists) <richard.earns...@arm.com>
Sent: Tuesday, January 30, 2024 11:36 AM
To: Andre Simoes Dias Vieira; gcc-patches@gcc.gnu.org
Cc: Kyrylo Tkachov; Stam Markianos-Wright
Subject: Re: [PATCH v3 2/2] arm: Add support for MVE Tail-Predicated Low 
Overhead Loops

On 19/01/2024 14:40, Andre Vieira wrote:
>
> Respin after comments from Kyrill and rebase. I also removed an if-then-else
> construct in arm_mve_check_reg_origin_is_num_elems similar to the other 
> functions
> Kyrill pointed out.
>
> After an earlier comment from Richard Sandiford I also added comments to the
> two tail predication patterns added to explain the need for the unspecs.

[missing ChangeLog]

I'm just going to focus on loop-doloop.c in this reply, I'll respond to the 
other bits in a follow-up.

      2)  (set (reg) (plus (reg) (const_int -1))
-         (set (pc) (if_then_else (reg != 0)
-                                (label_ref (label))
-                                (pc))).
+        (set (pc) (if_then_else (reg != 0)
+                                (label_ref (label))
+                                (pc))).

      Some targets (ARM) do the comparison before the branch, as in the
      following form:

-     3) (parallel [(set (cc) (compare ((plus (reg) (const_int -1), 0)))
-                   (set (reg) (plus (reg) (const_int -1)))])
-        (set (pc) (if_then_else (cc == NE)
...


This comment is becoming confusing.  Really the text leading up to 3)... should 
be inside 3.  Something like:

      3) Some targets (ARM) do the comparison before the branch, as in the
      following form:

      (parallel [(set (cc) (compare (plus (reg) (const_int -1)) 0))
                 (set (reg) (plus (reg) (const_int -1)))])
      (set (pc) (if_then_else (cc == NE)
                              (label_ref (label))
                              (pc)))])


The same issue on the comment structure also applies to the new point 4...

+      The ARM target also supports a special case of a counter that decrements
+      by `n` and terminating in a GTU condition.  In that case, the compare and
+      branch are all part of one insn, containing an UNSPEC:
+
+      4) (parallel [
+           (set (pc)
+               (if_then_else (gtu (unspec:SI [(plus:SI (reg:SI 14 lr)
+                                                       (const_int -n))])
+                                  (const_int n-1]))
+                   (label_ref)
+                   (pc)))
+           (set (reg:SI 14 lr)
+                (plus:SI (reg:SI 14 lr)
+                         (const_int -n)))
+     */

I think this needs a bit more clarification.  Specifically that this construct 
supports a predicated vectorized do loop.  Also, the placement of the unspec 
inside the comparison is ugnly and unnecessary.  It should be sufficient to 
have the unspec inside a USE expression, which the mid-end can then ignore 
entirely.  So

        (parallel
         [(set (pc) (if_then_else (gtu (plus (reg) (const_int -n))
                                       (const_int n-1))
                                  (label_ref) (pc)))
          (set (reg) (plus (reg) (const_int -n)))
          (additional clobbers and uses)])

For Arm, we then add a (use (unspec [(const_int 0)] N)) that is specific to 
this pattern to stop anything else from matching it.

Note that we don't need to mention that the register is 'LR' or the modes, 
those are specific to a particular backend, not the generic pattern we want to 
match.

+      || !CONST_INT_P (XEXP (inc_src, 1))
+      || INTVAL (XEXP (inc_src, 1)) >= 0)
     return 0;
+  int dec_num = abs (INTVAL (XEXP (inc_src, 1)));

We can just use '-INTVAL(...)' here, we've verified just above that the 
constant is negative.

-  if ((XEXP (condition, 0) == reg)
+  /* For the ARM special case of having a GTU: re-form the condition without
+     the unspec for the benefit of the middle-end.  */
+  if (GET_CODE (condition) == GTU)
+    {
+      condition = gen_rtx_fmt_ee (GTU, VOIDmode, inc_src,
+                                 GEN_INT (dec_num - 1));
+      return condition;
+    }

If you make the change I mentioned above, this re-forming isn't needed any 
more, so the arm-specific comment goes away

-   {
+    {
      if (GET_CODE (pattern) != PARALLEL)
      /*  For the second form we expect:

You've fixed the indentation of the brace (good), but the body of the braced 
expression needs re-indenting as well.

R.

Reply via email to