Hi Jeff,
Some comments since I wrote this patch some time ago...

On 28/08/17 19:26, Jeff Law wrote:
On 08/10/2017 03:14 PM, Michael Collison wrote:
Hi all,

One issue that we keep encountering on aarch64 is GCC not making good use of 
the flag-setting arithmetic instructions
like ADDS, SUBS, ANDS etc. that perform an arithmetic operation and compare the 
result against zero.
They are represented in a fairly standard way in the backend as PARALLEL 
patterns:
(parallel [(set (reg x1) (op (reg x2) (reg x3)))
            (set (reg cc) (compare (op (reg x2) (reg x3)) (const_int 0)))])

GCC isn't forming these from separate arithmetic and comparison instructions as 
aggressively as it could.
A particular pain point is when the result of the arithmetic insn is used 
before the comparison instruction.
The testcase in this patch is one such example where we have:
(insn 7 35 33 2 (set (reg/v:SI 0 x0 [orig:73 <retval> ] [73])
         (plus:SI (reg:SI 0 x0 [ x ])
             (reg:SI 1 x1 [ y ]))) "comb.c":3 95 {*addsi3_aarch64}
      (nil))
(insn 33 7 34 2 (set (reg:SI 1 x1 [77])
         (plus:SI (reg/v:SI 0 x0 [orig:73 <retval> ] [73])
             (const_int 2 [0x2]))) "comb.c":4 95 {*addsi3_aarch64}
      (nil))
(insn 34 33 17 2 (set (reg:CC 66 cc)
         (compare:CC (reg/v:SI 0 x0 [orig:73 <retval> ] [73])
             (const_int 0 [0]))) "comb.c":4 391 {cmpsi}
      (nil))

This scares combine away as x0 is used in insn 33 as well as the comparison in 
insn 34.
I think the compare-elim pass can help us here.
Is it the multiple use or the hard register that combine doesn't
appreciate.  The latter would definitely steer us towards compare-elim.

It's the multiple use IIRC.

This patch extends it by handling comparisons against zero, finding the 
defining instruction of the compare
and merging the comparison with the defining instruction into a PARALLEL that 
will hopefully match the form
described above. If between the comparison and the defining insn we find an 
instruction that uses the condition
registers or any control flow we bail out, and we don't cross basic blocks.
This simple technique allows us to catch cases such as this example.

Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu and 
x86_64.

Ok for trunk?

2017-08-05  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>
            Michael Collison <michael.colli...@arm.com>

        * compare-elim.c: Include emit-rtl.h.
        (can_merge_compare_into_arith): New function.
        (try_validate_parallel): Likewise.
        (try_merge_compare): Likewise.
        (try_eliminate_compare): Call the above when no previous clobber
        is available.
        (execute_compare_elim_after_reload): Add DF_UD_CHAIN and DF_DU_CHAIN
        dataflow problems.

2017-08-05  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>
            Michael Collison <michael.colli...@arm.com>
        
        * gcc.target/aarch64/cmpelim_mult_uses_1.c: New test.


pr5198v1.patch


diff --git a/gcc/compare-elim.c b/gcc/compare-elim.c
index 7e557a2..c65d155 100644
--- a/gcc/compare-elim.c
+++ b/gcc/compare-elim.c
@@ -65,6 +65,7 @@ along with GCC; see the file COPYING3.  If not see
  #include "tm_p.h"
  #include "insn-config.h"
  #include "recog.h"
+#include "emit-rtl.h"
  #include "cfgrtl.h"
  #include "tree-pass.h"
  #include "domwalk.h"
@@ -579,6 +580,145 @@ equivalent_reg_at_start (rtx reg, rtx_insn *end, rtx_insn 
*start)
    return reg;
  }
+/* Return true if it is okay to merge the comparison CMP_INSN with
+   the instruction ARITH_INSN.  Both instructions are assumed to be in the
+   same basic block with ARITH_INSN appearing before CMP_INSN.  This checks
+   that there are no uses or defs of the condition flags or control flow
+   changes between the two instructions.  */
+
+static bool
+can_merge_compare_into_arith (rtx_insn *cmp_insn, rtx_insn *arith_insn)
+{
+  for (rtx_insn *insn = PREV_INSN (cmp_insn);
+       insn && insn != arith_insn;
+       insn = PREV_INSN (insn))
+    {
+      if (!NONDEBUG_INSN_P (insn))
+       continue;
+      /* Bail if there are jumps or calls in between.  */
+      if (!NONJUMP_INSN_P (insn))
+       return false;
+
+      df_ref ref;
+      /* Find a USE of the flags register.  */
+      FOR_EACH_INSN_USE (ref, insn)
+       if (DF_REF_REGNO (ref) == targetm.flags_regnum)
+         return false;
+
+      /* Find a DEF of the flags register.  */
+      FOR_EACH_INSN_DEF (ref, insn)
+       if (DF_REF_REGNO (ref) == targetm.flags_regnum)
+         return false;
+    }
+  return true;
+}
What about old style asms?  Do you need to explicit punt those?  I don't
think they carry DF info.

I think we want to bail out on those to be safe. How are they represented in RTL?

Don't you also have to verify that the inputs to ARITH_INSN are
unchanged between ARITH_INSN and CMP_INSN?

I don't think so. As long as there is no flag clobbering instructions between them we should be ok.

Consider:
r1 := r2 + r3  // arith_insn
r2 := r4 + r5  // arith_insn input modified before cmp_insn
cc := CMP r1 0 // cmp_insn

this would be transformed into
{r1 := r2 + r3 ; cc := CMP (r2 + r3) 0} //parallel
r2 := r4 + r5





+
+/* Given two SET expressions, SET_A and SET_B determine whether they form
+   a recognizable pattern when emitted in parallel.  Return that parallel
+   if so.  Otherwise return NULL.  This tries both permutations of SET_A
+   and SET_B within the PARALLEL.  */
+
+static rtx
+try_validate_parallel (rtx set_a, rtx set_b)
+{
+  rtx par
+    = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, set_a, set_b));
+  rtx_insn *seq;
+  start_sequence ();
+  seq = emit_insn (par);
+  end_sequence ();
+  if (recog_memoized (seq) > 0)
+    return par;
+
+  par = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, set_b, set_a));
+  start_sequence ();
+  seq = emit_insn (par);
+  end_sequence ();
+  return recog_memoized (seq) > 0 ? par : NULL_RTX;
+}
I don't think you need to build up a sequence here.   Can't you just
allocate the INSN container and set its PATTERN?

Yes, I think you're right.

+
+/* For a comparison instruction described by CMP check if it compares a
+   register with zero i.e. it is of the form CC := CMP R1, 0.
+   If it is, find the instruction defining R1 (say I1) and try to create a
+   PARALLEL consisting of I1 and the comparison, representing a flag-setting
+   arithmetic instruction.  Example:
+   I1: R1 := R2 + R3
+   <instructions that don't read the condition register>
+   I2: CC := CMP R1 0
+   I2 can be merged with I1 into:
+   I1: { R1 := R2 + R3 ; CC := CMP (R2 + R3) 0 }
+   This catches cases where R1 is used between I1 and I2 and therefore
+   combine and other RTL optimisations will not try to propagate it into
+   I2.  Return true if we succeeded in merging CMP.  */
+
+static bool
+try_merge_compare (struct comparison *cmp)
+{
+  rtx_insn *cmp_insn = cmp->insn;
+
+  if (!REG_P (cmp->in_a) || cmp->in_b != const0_rtx)
+    return false;
+  rtx in_a = cmp->in_a;
+  if (!in_a)
+    return false;
Isn't the second IF redundant?  How can !in_a ever be true here given we
tested REG_P (cmp->in_a)?

Err... yes.


+  df_ref use;
+
+  FOR_EACH_INSN_USE (use, cmp_insn)
+    if (DF_REF_REGNO (use) == REGNO (in_a))
+      break;
+  if (!use)
+    return false;
+
+  struct df_link *ref_chain;
+  ref_chain = DF_REF_CHAIN (use);
+  if (!ref_chain || !ref_chain->ref
+      || !DF_REF_INSN_INFO (ref_chain->ref) || ref_chain->next != NULL)
+    return false;
So what are you checking for here?  I've got a pretty good guess, but
let's just make it explicit in a comment.

This snippet just uses df to find the instruction that defines in_a.
From dealing with the df infrastructure in the past I found that it needed some of this
NULL-checking. Adding a comment here would be good, I agree.

Thanks,
Kyrill


Generally this looks pretty close.  THe biggest concern I see is I think
you need to verify that the inputs on the arthmetic insn don't change
between the arithmetic and the compare.

jeff

Reply via email to