[Bug rtl-optimization/46920] suboptimal register allocation with local register variables

vmakarov at redhat dot com Tue, 14 Dec 2010 08:02:49 -0800

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46920


--- Comment #3 from Vladimir Makarov <vmakarov at redhat dot com> 2010-12-14 
16:02:09 UTC ---
(In reply to comment #2)
> > To generate the proposed code, we should assign r12 to p63.  IRA marks p63
> > conflicting with r12 because DF-infrastructure reports r12 having 
> > intersected
> > live ranges with p63.
> >
> > It is possible to solve the problem if we have conflicts based on values 
> > (not
> > live ranges).  I'd not recommend to do that, because it will slow down RA
> > without visible improvement on majority benchmarks (I did such experiment 
> > about
> > 7 years ago and reported about the results on GCC summit in 2004).
> 
> One alternative is to rematerialize values that have been copied to a
> hard register before their uses (by inserting an r12:DI=r63:DI before
> the use of r63).  This breaks the live ranges of the pseudos and
> facilitates coalescing.
> 

I'd not call it rematerialization.  I think it is more live range shrinking
(LRS) of hard register through additional copies.  It is an interesting idea (I
partially investigated LRS about 6 years ago).  Probably I should think about
this again.  Thanks, Paolo.

> > By the way, usage of implicit hard registers in RTL (when it can be avoided.
> > Example when hard registers can be avoided is their usage as call 
> > arguments) is
> > very bad idea for RA.  I see it a lot such code in x86-64 code.  I'd 
> > recommend
> > to prevent optimizations before RA to abuse hard register usage.
> 
> As I said, the improvement from hard register variable here is 25% on
> x86-64 and probably more (I can collect data) on i386.  This testcase
> is distilled from a bytecode interpreter.

Paolo, I did not mean that you should avoid to use hard register in this
particular case.  I just wrote that I saw a lot x86-64 code where hard
registers were propagated and that is a bad for RA.  I never had an opportunity
to investigate what optimization does it.

Again by the way :).  My experience with implementation of interpreters shows
me that usage of computed gotos does not work well (especially when there are a
lot such labels) with modern OOO processors because of worse branch
predictions.  I found a switch statement works better.  But I guess it is not
your goal to rewrite the interpriter.

[Bug rtl-optimization/46920] suboptimal register allocation with local register variables

Reply via email to