https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77498
--- Comment #5 from amker at gcc dot gnu.org --- (In reply to Richard Biener from comment #4) > CCing Bin, he was looking into PRE/predcom as well AFAIR. predictive > commoning here performs unrolling to be able to avoid some loop-carried > dependencies > while PRE has the larger distances covered by for example > > <bb 6> [85.00%]: > # prephitmp_656 = PHI <_125(6), pretmp_655(5)> > # prephitmp_674 = PHI <prephitmp_656(6), pretmp_673(5)> > > this kind of loop carried PHIs should be a hint for a tree level unroller > to perform unrolling (just in case they literally appear in source for > example). > OTOH if unrolling can solve the RA problem then the it must be solvable not > unrolled as well? Note that with predcom we end up with 11 pointer IVs while > with PRE we have just one (but use 20 others from the outer loop...) - > possibly > the versioning predcom performs makes IVO not do any outer loop IVO. Using > -fschedule-insns -fsched-pressure helps somewhat but not much. > > So it looks like a RA related issue and IVO is as much relevant as PRE doing > predictive commoning at -O2 (and at -O3 doing predcoms job but worse in this > case). > > During PHI translation we can tame this down to a level pre this rev. again, > for example with the following. But ideally we'd compute antic and do > insertion > for the full dataflow problem and only apply this "cost modeling" during > elimination to not lose secondary level transforms that are profitable > (also below we do not know whether we need to insert a PHI for the value in > the end). > > Index: gcc/tree-ssa-pre.c > =================================================================== > --- gcc/tree-ssa-pre.c (revision 244484) > +++ gcc/tree-ssa-pre.c (working copy) > @@ -1465,16 +1465,16 @@ phi_translate_1 (pre_expr expr, bitmap_s > { > /* For non-CONSTANTs we have to make sure we can eventually > insert the expression. Which means we need to have a > - leader for it. */ > - if (constant->kind != CONSTANT) > + leader for it. Avoid doing this across backedges though. > */ > + if (constant->kind == CONSTANT) > + return constant; > + else if (! dominated_by_p (CDI_DOMINATORS, pred, phiblock)) > { > unsigned value_id = get_expr_value_id (constant); > constant = find_leader_in_sets (value_id, set1, set2); > if (constant) > return constant; > } > - else > - return constant; > } > > tree result = vn_nary_op_lookup_pieces (newnary->length, > > > But as said, a whole different question is whether we want PRE to add IVs at > all > (but we do have some testcases requesting exactly that, for example > gcc.dg/tree-ssa/pr71347.c or ssa-pre-23.c requesting store-motion w/o > actually sinking the store). > > Index: gcc/tree-ssa-pre.c > =================================================================== > --- gcc/tree-ssa-pre.c (revision 244484) > +++ gcc/tree-ssa-pre.c (working copy) > @@ -4290,6 +4290,31 @@ eliminate_dom_walker::before_dom_childre > VN_INFO_RANGE_INFO (lhs)); > } > > + if (sprime > + && TREE_CODE (sprime) == SSA_NAME > + && do_pre > + && loop_outer (b->loop_father) > + && has_zero_uses (sprime) > + && bitmap_bit_p (inserted_exprs, SSA_NAME_VERSION (sprime))) > + { > + gimple *def_stmt = SSA_NAME_DEF_STMT (sprime); > + basic_block def_bb = gimple_bb (def_stmt); > + if (gimple_code (def_stmt) == GIMPLE_PHI > + && def_bb->loop_father->header == def_bb) > + { > + bool ok = true; > + edge_iterator ei; > + edge e; > + FOR_EACH_EDGE (e, ei, def_bb->preds) > + if (dominated_by_p (CDI_DOMINATORS, e->src, e->dest) > + && TREE_CODE (PHI_ARG_DEF_FROM_EDGE (def_stmt, e)) > == SSA_NAME) > + ok = false; > + /* Don't keep sprime available. */ > + if (!ok) > + sprime = NULL_TREE; > + } > + } > + > /* Inhibit the use of an inserted PHI on a loop header when > the address of the memory reference is a simple induction > variable. In other cases the vectorizer won't do anything I am trying to model register pressure and use that information to direct predcom. So far it detects only one case 436.cactusADM but does improve a lot. Though it's hard to model cost of pre_expr, but for loop carries ones, we may be able to simply control the number using pressure information too. According to the report, this is a register pressure issue caused by PRE, I will check if that can help. Thanks.