http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46920
--- Comment #3 from Vladimir Makarov <vmakarov at redhat dot com> 2010-12-14 16:02:09 UTC --- (In reply to comment #2) > > To generate the proposed code, we should assign r12 to p63. IRA marks p63 > > conflicting with r12 because DF-infrastructure reports r12 having > > intersected > > live ranges with p63. > > > > It is possible to solve the problem if we have conflicts based on values > > (not > > live ranges). I'd not recommend to do that, because it will slow down RA > > without visible improvement on majority benchmarks (I did such experiment > > about > > 7 years ago and reported about the results on GCC summit in 2004). > > One alternative is to rematerialize values that have been copied to a > hard register before their uses (by inserting an r12:DI=r63:DI before > the use of r63). This breaks the live ranges of the pseudos and > facilitates coalescing. > I'd not call it rematerialization. I think it is more live range shrinking (LRS) of hard register through additional copies. It is an interesting idea (I partially investigated LRS about 6 years ago). Probably I should think about this again. Thanks, Paolo. > > By the way, usage of implicit hard registers in RTL (when it can be avoided. > > Example when hard registers can be avoided is their usage as call > > arguments) is > > very bad idea for RA. I see it a lot such code in x86-64 code. I'd > > recommend > > to prevent optimizations before RA to abuse hard register usage. > > As I said, the improvement from hard register variable here is 25% on > x86-64 and probably more (I can collect data) on i386. This testcase > is distilled from a bytecode interpreter. Paolo, I did not mean that you should avoid to use hard register in this particular case. I just wrote that I saw a lot x86-64 code where hard registers were propagated and that is a bad for RA. I never had an opportunity to investigate what optimization does it. Again by the way :). My experience with implementation of interpreters shows me that usage of computed gotos does not work well (especially when there are a lot such labels) with modern OOO processors because of worse branch predictions. I found a switch statement works better. But I guess it is not your goal to rewrite the interpriter.
