Re: [PATCH 8/9] shrink-wrap: shrink-wrapping for separate components

Segher Boessenkool Fri, 09 Sep 2016 14:58:16 -0700

On Thu, Sep 08, 2016 at 12:34:07PM -0600, Jeff Law wrote:
> On 07/31/2016 07:42 PM, Segher Boessenkool wrote:
> >
> >Deciding what blocks should run with a certain component active so that
> >the total cost of executing the prologues (and epilogues) is optimal, is
> >not a computationally feasible problem.
> Really?  It's just a dataflow problem is it not and one that ought to 
> converge very quickly I'd think.  Or is it more a function of having to 
> run it on so many independent components?  I'm still pondering the value 
> of having every GPR be an independent component :-)


The cost function (as a function of which BBs runs with a certain component
enabled) is not monotonic: the cost for having a component active for a
subset of BBs can be higher, the same, or equal to the cost for all such
nodes (where "cost" means "how often do we execute this prologue").

> >Now all that is left is inserting prologues and epilogues on all edges
> >that jump into resp. out of the "active" set of blocks.  Often we need
> >to insert some components' prologues (or epilogues) on all edges into
> >(or out of) a block.  In theory cross-jumping can unify all such, but
> >in practice that often fails; besides, that is a lot of work.  So in
> >this case we insert the prologue and epilogue components at the "head"
> >or "tail" of a block, instead.
> Cross jumping is rather simplistic, so I'm not surprised that it doesn't 
> catch all this stuff.

I hoped it would, so I could have so much simpler code.  Sniff.

> >As a final optimisation, if a block needs a prologue and its immediate
> >dominator has the block as a post-dominator, the dominator gets the
> >prologue as well.
> So why not just put it in the idom and not in the dominated block? 

That's what it does :-)


> > void
> > thread_prologue_and_epilogue_insns (void)
> > {
> >+  if (optimize > 1)
> >+    {
> >+      df_live_add_problem ();
> >+      df_live_set_all_dirty ();
> >+    }
> Perhaps conditional on separate shrink wrapping?

Actually, I think we need to do this once more, one of the times always
(also when only using "regular" shrink-wrapping), because otherwise the
info in the dump file is out of whack.  Everything is okay once the next
pass starts, of course.  I'll have a look.

> >@@ -5932,6 +5936,13 @@ thread_prologue_and_epilogue_insns (void)
> >
> >   try_shrink_wrapping (&entry_edge, prologue_seq);
> >
> >+  df_analyze ();
> >+  try_shrink_wrapping_separate (entry_edge->dest);
> >+  if (crtl->shrink_wrapped_separate)
> >+    prologue_seq = make_prologue_seq ();
> Perhaps push the df_analyze call into try_shrink_wrapping_separate?

Yeah possibly.

> ANd if it was successful, do you need to update the DF information?  Is 
> that perhaps the cause of some of the issues we're seeing with DCE, 
> regrename, the scheduler?

See above.  The DF info is correct when the next pass starts (or ends,
I do not remember; the dump file does show correct information).

> Consider allowing the target to provide a mapping from the component to 
> a symbolic name of some kind and using that in the dumps.

Yeah that will help, once we do more than just GPRs anyway -- not everyone
remembers the GCC register #s for all registers ;-)

> Related, 
> consider using an enum rather than magic constants in the target bits (I 
> noted seeing component #0 as a magic constant in the ppc implementation).

But 0 is the hard register used there :-)

And since we now are C++, I cannot use enums as integers.  Sigh.

The meaning of "0" should of course be documented.

> So it seems like there's a toplevel list of components that's owned by 
> the target and state at each block components that are owned by the 
> generic code.  That's fine.  Just make sure we doc that the toplevel 
> list of components is allocated by the backend (and where does it get 
> freed?)

Every sbitmap is owned by the separate shrink-wrapping pass, not by the
target.  As the documentation says:

+@deftypefn {Target Hook} void TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS (sbitma
+Mark the components in the parameter as handled, so that the
+@code{prologue} and @code{epilogue} named patterns know to ignore those
+components.  The target code should not hang on to the @code{sbitmap}, it
+will be deleted after this call.
+@end deftypefn

> Consider using auto_sbitmap rather than manually managing 
> allocation/releasing of the per-block structures.  In fact, can't all of 
> SW become a class and we lose the explicit init/fini routines in favor 
> of a ctor/dtor?

Yes, you can always add indirection.  I do not think the code becomes
more readable that way (quite the opposite).  Explicit is *good*.

The init/fini code is small, and that is not an accident.

> >+      /* Find which prologue resp. epilogue components are needed for all
> >+     predecessor edges to this block.  */
> Nit.  Avoid ".resp".  I actually had to look that abbreviation up :-) 
> Being a bit more verbose in the comments would be preferred over ".resp".

"resp." is very common in mathematics papers.  It means either "and" or
"or", and it magically means either or both *and* exactly the thing you
mean there.

But I'll try to use it less often, sure.

> Having looked at this in more detail now, I'm wondering if we want to 
> talk at all in the docs about when selection vs disqualifying happens. 
> ISTM that we build the set of components we want to insert for each 
> edge/block.  When that's complete, we then prune those results with the 
> disqualifying sets.

We decide which blocks should have which components active.  From that,
it is easy to derive on which edges we need a prologue or an epilogue.
Now if the target says putting some prologue or epilogue on some certain
edge will not work, we just don't do that component at all.  It doesn't
happen very often.  In theory we could be smarter.

> For the PPC R0 vs LR is the only thing that causes disqualification 
> right?

Currently, yes.

> Can't that be handled when we build the set of components we 
> want to insert for each edge/block?  Is there some advantage to handling 
> disqualifications after all the potential insertion points have been 
> handled?

We do not know if an edge needs a prologue, epilogue, or neither, until
we have decided whether *both* ends of that edge want the component active
or not.


Segher

Re: [PATCH 8/9] shrink-wrap: shrink-wrapping for separate components

Reply via email to