Hi, For below rtl dump before loop invariant pass: LOOP: 1482: r1838:DI=0xffffffffffffa880 1483: r1837:DI=sfp:DI+r1838:DI REG_DEAD r1838:DI REG_EQUAL sfp:DI-0x5780 1484: r1839:V4SI=r910:V4SI>>const_vector 1485: r1840:V4SI=r1067:V4SI+r910:V4SI REG_DEAD r910:V4SI 1486: r1841:V4SI=r1840:V4SI>>const_vector REG_DEAD r1840:V4SI 1487: r1842:V8HI=vec_concat(trunc(r1839:V4SI),trunc(r1841:V4SI)) REG_DEAD r1841:V4SI REG_DEAD r1839:V4SI 1488: [r870:DI+r1837:DI]=r1842:V8HI ...
While the dump for loop invariant pass is as below: ;;Set in insn 1471 is invariant (0), cost 4, depends on ;;Set in insn 1482 is invariant (1), cost 4, depends on ;;Set in insn 1483 is invariant (2), cost 4, depends on 1 ;;Decided to move invariant 0 -- gain 4 ;;Decided to move invariant 1 -- gain 4 2034: r2163:DI=0xffffffffffffa880 LOOP: 1483: r1837:DI=sfp:DI+r2163:DI REG_DEAD r1838:DI REG_EQUAL sfp:DI-0x5780 1484: r1839:V4SI=r910:V4SI>>const_vector 1485: r1840:V4SI=r1067:V4SI+r910:V4SI REG_DEAD r910:V4SI 1486: r1841:V4SI=r1840:V4SI>>const_vector REG_DEAD r1840:V4SI 1487: r1842:V8HI=vec_concat(trunc(r1839:V4SI),trunc(r1841:V4SI)) REG_DEAD r1841:V4SI REG_DEAD r1839:V4SI 1488: [r870:DI+r1837:DI]=r1842:V8HI ... Note instructions 1482 and 1483 both compute loop invariant values, but only 1482 is hoisted out of loop. Since computation in 1483 uses sfp, the final assembly code is even worse because we need another one or two instructions to compute sfp, depending on the immediate constant value. The direct reason that r1837 isn't hoisted is its cost is computed as 0, rather than 4. I believe this is a known issue and we have tried more than once by tuning the famous magic number "3" in loop-invariant.c. After investigation, I believe the problem lies in cost computation, rather than the magic number itself. Maybe that's one reason those experiments didn't end with good results. The below check conditions count invariant expr's cost only if the expr is used outside of address expression, or the address expression is too expensive. There is an implicit assumption in it: If the invariant expression is not referred outside of address expression, it can be forward propagated into address expressions. But this assumption is not always true, especially on target with limited addressing modes. if (!inv->cheap_address || inv->def->n_uses = 0 || inv->def->n_addr_uses < inv->def->n_uses) (*comp_cost) += inv->cost * inv->eqno; Look at the example, r1837 computed in insn1483 is used in address expression in insn1488, but it can't be forward propagated into it because "r1870 + sfp + 0xffffa880" isn't a valid address expression on aarch64. Which means r1837 has to be computed as an independent instruction and the cost should be counted. IMHO, we need to track if loop invariant expression can/cant be propagated into address expressions and use that information to compute the cost, as below: if (!inv->cheap_address || inv->def->n_uses = 0 || inv->def->n_addr_uses < inv->def->n_uses || inv->def->cant_prop_to_addr_use) (*comp_cost) += inv->cost * inv->eqno; Though this patch can be improved by analyze address expression propagation more precisely, experiments shows spec2k/fp is already improved on aarch64. I will collect data for spec2k6 later but would like to start discussing before my holiday. I also collected spec2k6 on x86_64, no regression. Bootstrap and test on x86_64 and x86_32. Will test it on aarch64. So any comments? Thanks, bin 2015-09-28 Bin Cheng <bin.ch...@arm.com> * loop-invariant.c (struct def): New field cant_fwprop_to_addr_uses. (inv_cant_fwprop_to_addr_use): New function. (record_use): Call inv_cant_fwprop_to_addr_use, set the new field. (get_inv_cost): Count cost if inv can't be propagated into its address uses.
diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c index 52c8ae8..3c2395c 100644 --- a/gcc/loop-invariant.c +++ b/gcc/loop-invariant.c @@ -99,6 +99,8 @@ struct def unsigned n_uses; /* Number of such uses. */ unsigned n_addr_uses; /* Number of uses in addresses. */ unsigned invno; /* The corresponding invariant. */ + bool cant_prop_to_addr_uses; /* True if the corresponding inv can't be + propagated into its address uses. */ }; /* The data stored for each invariant. */ @@ -762,6 +764,34 @@ create_new_invariant (struct def *def, rtx_insn *insn, bitmap depends_on, return inv; } +/* Given invariant DEF and its address USE, check if the corresponding + invariant expr can be propagated into the use or not. */ + +static bool +inv_cant_prop_to_addr_use (struct def *def, df_ref use) +{ + struct invariant *inv; + rtx *pos = DF_REF_REAL_LOC (use), def_set; + rtx_insn *use_insn = DF_REF_INSN (use); + rtx_insn *def_insn; + bool ok; + + inv = invariants[def->invno]; + /* No need to check if address expression is expensive. */ + if (!inv->cheap_address) + return true; + + def_insn = inv->insn; + def_set = single_set (def_insn); + if (!def_set) + return true; + + validate_unshare_change (use_insn, pos, SET_SRC (def_set), true); + ok = verify_changes (0); + cancel_changes (0); + return !ok; +} + /* Record USE at DEF. */ static void @@ -777,7 +807,11 @@ record_use (struct def *def, df_ref use) def->uses = u; def->n_uses++; if (u->addr_use_p) - def->n_addr_uses++; + { + def->n_addr_uses++; + if (!def->cant_prop_to_addr_uses && inv_cant_prop_to_addr_use (def, use)) + def->cant_prop_to_addr_uses = true; + } } /* Finds the invariants USE depends on and store them to the DEPENDS_ON @@ -1158,7 +1192,9 @@ get_inv_cost (struct invariant *inv, int *comp_cost, unsigned *regs_needed, if (!inv->cheap_address || inv->def->n_uses == 0 - || inv->def->n_addr_uses < inv->def->n_uses) + || inv->def->n_addr_uses < inv->def->n_uses + /* Count cost if the inv can't be propagated into address uses. */ + || inv->def->cant_prop_to_addr_uses) (*comp_cost) += inv->cost * inv->eqno; #ifdef STACK_REGS