https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77498
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |missed-optimization, ra CC| |amker at gcc dot gnu.org --- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> --- CCing Bin, he was looking into PRE/predcom as well AFAIR. predictive commoning here performs unrolling to be able to avoid some loop-carried dependencies while PRE has the larger distances covered by for example <bb 6> [85.00%]: # prephitmp_656 = PHI <_125(6), pretmp_655(5)> # prephitmp_674 = PHI <prephitmp_656(6), pretmp_673(5)> this kind of loop carried PHIs should be a hint for a tree level unroller to perform unrolling (just in case they literally appear in source for example). OTOH if unrolling can solve the RA problem then the it must be solvable not unrolled as well? Note that with predcom we end up with 11 pointer IVs while with PRE we have just one (but use 20 others from the outer loop...) - possibly the versioning predcom performs makes IVO not do any outer loop IVO. Using -fschedule-insns -fsched-pressure helps somewhat but not much. So it looks like a RA related issue and IVO is as much relevant as PRE doing predictive commoning at -O2 (and at -O3 doing predcoms job but worse in this case). During PHI translation we can tame this down to a level pre this rev. again, for example with the following. But ideally we'd compute antic and do insertion for the full dataflow problem and only apply this "cost modeling" during elimination to not lose secondary level transforms that are profitable (also below we do not know whether we need to insert a PHI for the value in the end). Index: gcc/tree-ssa-pre.c =================================================================== --- gcc/tree-ssa-pre.c (revision 244484) +++ gcc/tree-ssa-pre.c (working copy) @@ -1465,16 +1465,16 @@ phi_translate_1 (pre_expr expr, bitmap_s { /* For non-CONSTANTs we have to make sure we can eventually insert the expression. Which means we need to have a - leader for it. */ - if (constant->kind != CONSTANT) + leader for it. Avoid doing this across backedges though. */ + if (constant->kind == CONSTANT) + return constant; + else if (! dominated_by_p (CDI_DOMINATORS, pred, phiblock)) { unsigned value_id = get_expr_value_id (constant); constant = find_leader_in_sets (value_id, set1, set2); if (constant) return constant; } - else - return constant; } tree result = vn_nary_op_lookup_pieces (newnary->length, But as said, a whole different question is whether we want PRE to add IVs at all (but we do have some testcases requesting exactly that, for example gcc.dg/tree-ssa/pr71347.c or ssa-pre-23.c requesting store-motion w/o actually sinking the store). Index: gcc/tree-ssa-pre.c =================================================================== --- gcc/tree-ssa-pre.c (revision 244484) +++ gcc/tree-ssa-pre.c (working copy) @@ -4290,6 +4290,31 @@ eliminate_dom_walker::before_dom_childre VN_INFO_RANGE_INFO (lhs)); } + if (sprime + && TREE_CODE (sprime) == SSA_NAME + && do_pre + && loop_outer (b->loop_father) + && has_zero_uses (sprime) + && bitmap_bit_p (inserted_exprs, SSA_NAME_VERSION (sprime))) + { + gimple *def_stmt = SSA_NAME_DEF_STMT (sprime); + basic_block def_bb = gimple_bb (def_stmt); + if (gimple_code (def_stmt) == GIMPLE_PHI + && def_bb->loop_father->header == def_bb) + { + bool ok = true; + edge_iterator ei; + edge e; + FOR_EACH_EDGE (e, ei, def_bb->preds) + if (dominated_by_p (CDI_DOMINATORS, e->src, e->dest) + && TREE_CODE (PHI_ARG_DEF_FROM_EDGE (def_stmt, e)) == SSA_NAME) + ok = false; + /* Don't keep sprime available. */ + if (!ok) + sprime = NULL_TREE; + } + } + /* Inhibit the use of an inserted PHI on a loop header when the address of the memory reference is a simple induction variable. In other cases the vectorizer won't do anything