What I've observed on power is that LTO alone reduces performance and
LTO+FDO is not significantly different than FDO alone.

I agree that an exact estimate of the register pressure would be a
difficult problem. I'm hoping that something that approximates potential
register pressure downstream will be sufficient to help inlining
decisions. 

  Aaron

On Fri, 2014-04-18 at 10:36 -0700, Xinliang David Li wrote:
> Do you witness similar problems with LTO +FDO?
> 
> My concern is it can be tricky to get the register pressure estimate
> right. The register pressure problem is created by downstream
> components (code motions etc) but only exposed by the inliner.  If you
> want to get it 'right' (i.e., not exposing the problems), you will
> need to bake the knowledge of the downstream components (possibly
> bugs) into the analysis which might not be a good thing to do longer
> term.
> 
> David
> 
> On Fri, Apr 18, 2014 at 9:43 AM, Aaron Sawdey
> <acsaw...@linux.vnet.ibm.com> wrote:
> > Honza,
> >   Seeing your recent patches relating to inliner heuristics for LTO, I
> > thought I should mention some related work I'm doing.
> >
> > By way of introduction, I've recently joined the IBM LTC's PPC Toolchain
> > team, working on gcc performance.
> >
> > We have not generally seen good results using LTO on IBM power processors
> > and one of the problems seems to be excessive inlining that results in the
> > generation of excessive spill code. So, I have set out to tackle this by
> > doing some analysis at the time of the inliner pass to compute something
> > analogous to register pressure, which is then used to shut down inlining of
> > routines that have a lot of pressure.
> >
> > The analysis is basically a liveness analysis on the SSA names per basic
> > block and looking for the maximum number live in any block. I've been using
> > "liveness pressure" as a shorthand name for this.
> >
> > This can then be used in two ways.
> > 1) want_inline_function_to_all_callers_p at present always says to inline
> > things that have only one call site without regard to size or what this may
> > do to the register allocator downstream. In particular, BZ2_decompress in
> > bzip2 gets inlined and this causes the pressure reported downstream for the
> > int register class to increase 10x. Looking at some combination of pressure
> > in caller/callee may help avoid this kind of situation.
> > 2) I also want to experiment with adding the liveness pressure in the callee
> > into the badness calculation in edge_badness used by inline_small_functions.
> > The idea here is to try to inline functions that are less likely to cause
> > register allocator difficulty downstream first.
> >
> > I am just at the point of getting a prototype working, I will get a patch
> > you could take a look at posted next week. In the meantime, do you have any
> > comments or feedback?
> >
> > Thanks,
> >    Aaron
> >
> > --
> > Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
> > 050-2/C113  (507) 253-7520 home: 507/263-0782
> > IBM Linux Technology Center - PPC Toolchain
> >
> 

-- 
Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain

Reply via email to