[PATCH GCC]Improve rtl loop inv cost by checking if the inv can be propagated to address uses

Bin Cheng Mon, 28 Sep 2015 02:43:35 -0700

Hi,
For below rtl dump before loop invariant pass:

LOOP:
 1482: r1838:DI=0xffffffffffffa880
 1483: r1837:DI=sfp:DI+r1838:DI
      REG_DEAD r1838:DI
      REG_EQUAL sfp:DI-0x5780
 1484: r1839:V4SI=r910:V4SI>>const_vector
 1485: r1840:V4SI=r1067:V4SI+r910:V4SI
      REG_DEAD r910:V4SI
 1486: r1841:V4SI=r1840:V4SI>>const_vector
      REG_DEAD r1840:V4SI
 1487: r1842:V8HI=vec_concat(trunc(r1839:V4SI),trunc(r1841:V4SI))
      REG_DEAD r1841:V4SI
      REG_DEAD r1839:V4SI
 1488: [r870:DI+r1837:DI]=r1842:V8HI
 ...


While the dump for loop invariant pass is as below:

;;Set in insn 1471 is invariant (0), cost 4, depends on 
;;Set in insn 1482 is invariant (1), cost 4, depends on 
;;Set in insn 1483 is invariant (2), cost 4, depends on 1
;;Decided to move invariant 0 -- gain 4
;;Decided to move invariant 1 -- gain 4
 
 2034: r2163:DI=0xffffffffffffa880
LOOP:
 1483: r1837:DI=sfp:DI+r2163:DI
      REG_DEAD r1838:DI
      REG_EQUAL sfp:DI-0x5780
 1484: r1839:V4SI=r910:V4SI>>const_vector
 1485: r1840:V4SI=r1067:V4SI+r910:V4SI
      REG_DEAD r910:V4SI
 1486: r1841:V4SI=r1840:V4SI>>const_vector
      REG_DEAD r1840:V4SI
 1487: r1842:V8HI=vec_concat(trunc(r1839:V4SI),trunc(r1841:V4SI))
      REG_DEAD r1841:V4SI
      REG_DEAD r1839:V4SI
 1488: [r870:DI+r1837:DI]=r1842:V8HI
 ...

Note instructions 1482 and 1483 both compute loop invariant values, but only
1482 is hoisted out of loop. Since computation in 1483 uses sfp, the final
assembly code is even worse because we need another one or two instructions
to compute sfp, depending on the immediate constant value.

The direct reason that r1837 isn't hoisted is its cost is computed as 0,
rather than 4.  I believe this is a known issue and we have tried more than
once by tuning the famous magic number "3" in loop-invariant.c.  After
investigation, I believe the problem lies in cost computation, rather than
the magic number itself.  Maybe that's one reason those experiments didn't
end with good results.

The below check conditions count invariant expr's cost only if the expr is
used outside of address expression, or the address expression is too
expensive. There is an implicit assumption in it: If the invariant
expression is not referred outside of address expression, it can be forward
propagated into address expressions. But this assumption is not always true,
especially on target with limited addressing modes.

  if (!inv->cheap_address
      || inv->def->n_uses = 0
      || inv->def->n_addr_uses < inv->def->n_uses)
    (*comp_cost) += inv->cost * inv->eqno;

Look at the example, r1837 computed in insn1483 is used in address
expression in insn1488, but it can't be forward propagated into it because
"r1870 + sfp + 0xffffa880" isn't a valid address expression on aarch64.
Which means r1837 has to be computed as an independent instruction and the
cost should be counted.

IMHO, we need to track if loop invariant expression can/cant be propagated
into address expressions and use that information to compute the cost, as
below:

  if (!inv->cheap_address
      || inv->def->n_uses = 0
      || inv->def->n_addr_uses < inv->def->n_uses
      || inv->def->cant_prop_to_addr_use)
    (*comp_cost) += inv->cost * inv->eqno;

Though this patch can be improved by analyze address expression propagation
more precisely, experiments shows spec2k/fp is already improved on aarch64.
I will collect data for spec2k6 later but would like to start discussing
before my holiday.  I also collected spec2k6 on x86_64, no regression.

Bootstrap and test on x86_64 and x86_32.  Will test it on aarch64.  So any
comments?

Thanks,
bin

2015-09-28  Bin Cheng  <bin.ch...@arm.com>

        * loop-invariant.c (struct def): New field cant_fwprop_to_addr_uses.
        (inv_cant_fwprop_to_addr_use): New function.
        (record_use): Call inv_cant_fwprop_to_addr_use, set the new field.
        (get_inv_cost): Count cost if inv can't be propagated into its
        address uses.

diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
index 52c8ae8..3c2395c 100644
--- a/gcc/loop-invariant.c
+++ b/gcc/loop-invariant.c
@@ -99,6 +99,8 @@ struct def
   unsigned n_uses;             /* Number of such uses.  */
   unsigned n_addr_uses;                /* Number of uses in addresses.  */
   unsigned invno;              /* The corresponding invariant.  */
+  bool cant_prop_to_addr_uses; /* True if the corresponding inv can't be
+                                  propagated into its address uses.  */
 };
 
 /* The data stored for each invariant.  */
@@ -762,6 +764,34 @@ create_new_invariant (struct def *def, rtx_insn *insn, 
bitmap depends_on,
   return inv;
 }
 
+/* Given invariant DEF and its address USE, check if the corresponding
+   invariant expr can be propagated into the use or not.  */
+
+static bool
+inv_cant_prop_to_addr_use (struct def *def, df_ref use)
+{
+  struct invariant *inv;
+  rtx *pos = DF_REF_REAL_LOC (use), def_set;
+  rtx_insn *use_insn = DF_REF_INSN (use);
+  rtx_insn *def_insn;
+  bool ok;
+
+  inv = invariants[def->invno];
+  /* No need to check if address expression is expensive.  */
+  if (!inv->cheap_address)
+    return true;
+
+  def_insn = inv->insn;
+  def_set = single_set (def_insn);
+  if (!def_set)
+    return true;
+
+  validate_unshare_change (use_insn, pos, SET_SRC (def_set), true);
+  ok = verify_changes (0);
+  cancel_changes (0);
+  return !ok;
+}
+
 /* Record USE at DEF.  */
 
 static void
@@ -777,7 +807,11 @@ record_use (struct def *def, df_ref use)
   def->uses = u;
   def->n_uses++;
   if (u->addr_use_p)
-    def->n_addr_uses++;
+    {
+      def->n_addr_uses++;
+      if (!def->cant_prop_to_addr_uses && inv_cant_prop_to_addr_use (def, use))
+       def->cant_prop_to_addr_uses = true;
+    }
 }
 
 /* Finds the invariants USE depends on and store them to the DEPENDS_ON
@@ -1158,7 +1192,9 @@ get_inv_cost (struct invariant *inv, int *comp_cost, 
unsigned *regs_needed,
 
   if (!inv->cheap_address
       || inv->def->n_uses == 0
-      || inv->def->n_addr_uses < inv->def->n_uses)
+      || inv->def->n_addr_uses < inv->def->n_uses
+      /* Count cost if the inv can't be propagated into address uses.  */
+      || inv->def->cant_prop_to_addr_uses)
     (*comp_cost) += inv->cost * inv->eqno;
 
 #ifdef STACK_REGS

[PATCH GCC]Improve rtl loop inv cost by checking if the inv can be propagated to address uses

Reply via email to