On Thu, 26 Jun 2014, Richard Biener wrote: > On Thu, 26 Jun 2014, Richard Biener wrote: > > > On Wed, 25 Jun 2014, Jeff Law wrote: > > > > > On 06/25/14 08:05, Richard Biener wrote: > > > > > > > > This removes restrictions in DOM cprop_operand that inhibit > > > > some optimizations. The volatile pointer thing is really realy > > > > old and no longer necessary while the loop-depth consideration > > > > is only valid for loop-closed PHI nodes (but we're not in > > > > loop-closed SSA in DOM) - the coalescing is handled in out-of-SSA > > > > phase by inserting copies appropriately. > > > > > > > > Bootstrapped on x86_64-unknown-linux-gnu, ok? > > > > > > > > Thanks, > > > > Richard. > > > > > > > > 2014-06-25 Richard Biener <rguent...@suse.de> > > > > > > > > PR tree-optimization/61607 > > > > * tree-ssa-dom.c (cprop_operand): Remove restriction on > > > > propagating volatile pointers and on loop depth. > > > The first hunk is OK. > > > > > > I thought we had tests for the do not copy propagate out of a loop nest > > > in the > > > suite. Did you check that tests in BZ 19038 still generate good code > > > after > > > this change? If we still generate good code for those tests, then this > > > hunk > > > is fine too. > > > > I have applied the first hunk and will investigate further. Testing > > didn't show any issue and I know how to retain the check but not > > cause the missed optimization shown in PR61607. > > Let's try to summarize what the restriction is supposed to avoid. > It tries to avoid introducing uses of SSA names defined inside a > loop outside of it because if the SSA name is live over the backedge > we will then have an overlapping life-range which prevents out-of-SSA > from coalescing it to a single register. > > Now, the existing test is not working in that way. > > Rather the best way we have to ensure this property (all outside > uses go through a copy that is placed on exit edges rather than > possibly on the backedge) is to go into loop-closed SSA form. > This is also where the PHI nodes that confuse DOM in PR61607 > come from in the first place. > > Now as the existing measure is ineffective in some cases out-of-SSA > has gotten the ability to deal with this (or a subset): > > /* If elimination of a PHI requires inserting a copy on a backedge, > then we will have to split the backedge which has numerous > undesirable performance effects. > > A significant number of such cases can be handled here by inserting > copies into the loop itself. */ > insert_backedge_copies (); > > now, this doesn't seem to deal with outside uses. But eventually > the coalescing code already assigns proper cost to backedge copies > so that we choose to place copies on the exit edges rather than > the backedge ones - seems not so from looking at coalesce_cost_edge. > > So I think that we should remove the copy-propagation restrictions > and instead address this in out-of-SSA. > > For now the following patch retains the exact same restriction in > DOM as it is present in copyprop (but not in FRE - ok my recent fault, > or in VRP). By avoiding to record the equivalency for PHIs > (where we know that either all or no uses should be covered by > the loop depth check) we retain the ability to record the equivalency > for the two loop exit PHI nodes and thus the threading (if only > on the false path). > > Bootstrap and regtest running on x86_64-unknown-linux-gnu. > > I'll try to see what happens to the PR19038 testcases (though > that PR is a mess ...)
I checked the very original one (thin6d.f from sixtrack) and the generated assembly for -Ofast is the same without any patch and with _all_ loop_depth_of_name restrictions removed from both DOM and copyprop (thus making loop_depth_of_name dead). The cost of out-of-SSA copies for backedges (or in the case of the PR, loop latch edges causing an edge split) is dealt with by /* Inserting copy on critical edge costs more than inserting it elsewhere. */ if (EDGE_CRITICAL_P (e)) mult = 2; in coalesce_cost_edge. So in the end, without a testcase to investigate, I'd propose to get rid of those restrictions. I'm still going forward with the patch below for now. Richard. > Richard. > > 2014-06-26 Richard Biener <rguent...@suse.de> > > PR tree-optimization/61607 > * tree-ssa-copy.c (copy_prop_visit_phi_node): Adjust comment > explaining why we restrict copies on loop depth. > * tree-ssa-dom.c (cprop_operand): Remove restriction on > on loop depth. > (record_equivalences_from_phis): Instead add it here. > > * gcc.dg/tree-ssa/ssa-dom-thread-5.c: New testcase. > > Index: gcc/tree-ssa-copy.c > =================================================================== > --- gcc/tree-ssa-copy.c (revision 212012) > +++ gcc/tree-ssa-copy.c (working copy) > @@ -401,11 +401,8 @@ copy_prop_visit_phi_node (gimple phi) > arg_value = valueize_val (arg); > > /* Avoid copy propagation from an inner into an outer loop. > - Otherwise, this may move loop variant variables outside of > - their loops and prevent coalescing opportunities. If the > - value was loop invariant, it will be hoisted by LICM and > - exposed for copy propagation. > - ??? The value will be always loop invariant. > + Otherwise, this may introduce uses of loop variant variables > + outside of their loops and prevent coalescing opportunities. > In loop-closed SSA form do not copy-propagate through > PHI nodes in blocks with a loop exit edge predecessor. */ > if (TREE_CODE (arg_value) == SSA_NAME > Index: gcc/tree-ssa-dom.c > =================================================================== > --- gcc/tree-ssa-dom.c (revision 212013) > +++ gcc/tree-ssa-dom.c (working copy) > @@ -1234,7 +1234,13 @@ record_equivalences_from_phis (basic_blo > this, since this is a true assignment and not an equivalence > inferred from a comparison. All uses of this ssa name are dominated > by this assignment, so unwinding just costs time and space. */ > - if (i == gimple_phi_num_args (phi) && may_propagate_copy (lhs, rhs)) > + if (i == gimple_phi_num_args (phi) > + && may_propagate_copy (lhs, rhs) > + /* Do not propagate copies if the propagated value is at a deeper loop > + depth than the propagatee. Otherwise, this may introduce uses > + of loop variant variables outside of their loops and prevent > + coalescing opportunities. */ > + && !(loop_depth_of_name (rhs) > loop_depth_of_name (lhs))) > set_ssa_name_value (lhs, rhs); > } > } > @@ -2247,14 +2253,6 @@ cprop_operand (gimple stmt, use_operand_ > if (!may_propagate_copy (op, val)) > return; > > - /* Do not propagate copies if the propagated value is at a deeper loop > - depth than the propagatee. Otherwise, this may move loop variant > - variables outside of their loops and prevent coalescing > - opportunities. If the value was loop invariant, it will be hoisted > - by LICM and exposed for copy propagation. */ > - if (loop_depth_of_name (val) > loop_depth_of_name (op)) > - return; > - > /* Do not propagate copies into simple IV increment statements. > See PR23821 for how this can disturb IV analysis. */ > if (TREE_CODE (val) != INTEGER_CST > Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-5.c > =================================================================== > --- gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-5.c (revision 0) > +++ gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-5.c (working copy) > @@ -0,0 +1,23 @@ > +/* { dg-do compile } */ > +/* { dg-options "-Os -fno-tree-fre -fdump-tree-dom1-details" } */ > + > +void foo(int *); > +void f2(int dst[3], int R) > +{ > + int i, inter[2]; > + _Bool inter0p = 0; > + _Bool inter1p = 0; > + for (i = 1; i < R; i++) > + { > + inter0p = 1; > + inter1p = 1; > + } > + if (inter0p) > + inter[0] = 1; > + if (inter1p) > + inter[1] = 1; > + foo(inter); > +} > + > +/* { dg-final { scan-tree-dump "Threaded jump" "dom1" } } */ > +/* { dg-final { cleanup-tree-dump "dom1" } } */ > -- Richard Biener <rguent...@suse.de> SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer