https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77498

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization, ra
                 CC|                            |amker at gcc dot gnu.org

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
CCing Bin, he was looking into PRE/predcom as well AFAIR.  predictive commoning
here performs unrolling to be able to avoid some loop-carried dependencies
while PRE has the larger distances covered by for example

<bb 6> [85.00%]:
# prephitmp_656 = PHI <_125(6), pretmp_655(5)>
# prephitmp_674 = PHI <prephitmp_656(6), pretmp_673(5)>

this kind of loop carried PHIs should be a hint for a tree level unroller
to perform unrolling (just in case they literally appear in source for
example).
OTOH if unrolling can solve the RA problem then the it must be solvable not
unrolled as well?  Note that with predcom we end up with 11 pointer IVs while
with PRE we have just one (but use 20 others from the outer loop...) - possibly
the versioning predcom performs makes IVO not do any outer loop IVO.  Using
-fschedule-insns -fsched-pressure helps somewhat but not much.

So it looks like a RA related issue and IVO is as much relevant as PRE doing
predictive commoning at -O2 (and at -O3 doing predcoms job but worse in this
case).

During PHI translation we can tame this down to a level pre this rev. again,
for example with the following.  But ideally we'd compute antic and do
insertion
for the full dataflow problem and only apply this "cost modeling" during
elimination to not lose secondary level transforms that are profitable
(also below we do not know whether we need to insert a PHI for the value in
the end).

Index: gcc/tree-ssa-pre.c
===================================================================
--- gcc/tree-ssa-pre.c  (revision 244484)
+++ gcc/tree-ssa-pre.c  (working copy)
@@ -1465,16 +1465,16 @@ phi_translate_1 (pre_expr expr, bitmap_s
              {
                /* For non-CONSTANTs we have to make sure we can eventually
                   insert the expression.  Which means we need to have a
-                  leader for it.  */
-               if (constant->kind != CONSTANT)
+                  leader for it.  Avoid doing this across backedges though. 
*/
+               if (constant->kind == CONSTANT)
+                 return constant;
+               else if (! dominated_by_p (CDI_DOMINATORS, pred, phiblock))
                  {
                    unsigned value_id = get_expr_value_id (constant);
                    constant = find_leader_in_sets (value_id, set1, set2);
                    if (constant)
                      return constant;
                  }
-               else
-                 return constant;
              }

            tree result = vn_nary_op_lookup_pieces (newnary->length,


But as said, a whole different question is whether we want PRE to add IVs at
all
(but we do have some testcases requesting exactly that, for example
gcc.dg/tree-ssa/pr71347.c or ssa-pre-23.c requesting store-motion w/o actually
sinking the store).

Index: gcc/tree-ssa-pre.c
===================================================================
--- gcc/tree-ssa-pre.c  (revision 244484)
+++ gcc/tree-ssa-pre.c  (working copy)
@@ -4290,6 +4290,31 @@ eliminate_dom_walker::before_dom_childre
                                               VN_INFO_RANGE_INFO (lhs));
            }

+         if (sprime
+             && TREE_CODE (sprime) == SSA_NAME
+             && do_pre
+             && loop_outer (b->loop_father)
+             && has_zero_uses (sprime)
+             && bitmap_bit_p (inserted_exprs, SSA_NAME_VERSION (sprime)))
+           {
+             gimple *def_stmt = SSA_NAME_DEF_STMT (sprime);
+             basic_block def_bb = gimple_bb (def_stmt);
+             if (gimple_code (def_stmt) == GIMPLE_PHI
+                 && def_bb->loop_father->header == def_bb)
+               {
+                 bool ok = true;
+                 edge_iterator ei;
+                 edge e;
+                 FOR_EACH_EDGE (e, ei, def_bb->preds)
+                   if (dominated_by_p (CDI_DOMINATORS, e->src, e->dest)
+                       && TREE_CODE (PHI_ARG_DEF_FROM_EDGE (def_stmt, e)) ==
SSA_NAME)
+                     ok = false;
+                 /* Don't keep sprime available.  */
+                 if (!ok)
+                   sprime = NULL_TREE;
+               }
+           }
+
          /* Inhibit the use of an inserted PHI on a loop header when
             the address of the memory reference is a simple induction
             variable.  In other cases the vectorizer won't do anything

Reply via email to