On 04/11/2012 10:35 AM, Bernd Schmidt wrote:
On 12/23/2011 05:31 PM, Vladimir Makarov wrote:
On 12/21/2011 09:09 AM, Bernd Schmidt wrote:
This patch was an experiment to see if we can get the same improvement
with modifications to IRA, making it more tolerant to over-aggressive
scheduling. THe idea is that if an instruction sets a register A, and
all its inputs are live and unmodified for the lifetime of A, then
moving the instruction downwards towards its first use is going to be
beneficial from a register pressure point of view.
That alone, however, turns out to be too aggressive, performance drops
presumably because we undo too many scheduling decisions. So, the patch
detects such situations, and splits the pseudo; a new pseudo is
introduced in the original setting instruction, and a copy is added
before the first use. If the new pseudo does not get a hard register, it
is removed again and instead the setting instruction is moved to the
point of the copy.
This gets up to 6.5% on 456.hmmer on the mips target I was working on;
an embedded benchmark suite also seems to have a (small) geomean
improvement. On x86_64, I've tested spec2k, where specint is unchanged
and specfp has a tiny performance regression. All these tests were done
with a gcc-4.6 based tree.
Thoughts? Currently the patch feels somewhat bolted on to the side of
IRA, maybe there's a nicer way to achieve this?
I think that is an excellent idea. I used analogous approach for
splitting pseudo in IRA on loop bounds even if it gets hard register
inside and outside loops. The copies are removed if the live ranges
were not spilled in reload.
I have no problem with this patch. It is just a small change in IRA.
Sounds like you're happier with the patch than I am, so who am I to argue.
Here's an updated version against current trunk, with some cc0 bugfixes
that I've since discovered to be necessary. Bootstrapped and tested (but
not benchmarked again) on i686-linux. Ok?
It is ok. At least it will be useful for gcc4.8.
But I am not sure about the longevity of this code. Since my last email
a lot was changed on LRA project (which I hope will be ready for
gcc4.9). I've implemented live range splitting which works analogously:
some pseudo ranges are splited and if a split range does not change the
assignment, pseudo live range split is undone. The difference in your
approach is that it is done with usage of global view (global RA) and
mine is done locally. So it needs more investigation how different the
results are. It seems to me that they will complement each other.
Probably I'll investigate this when/if LRA is merged.
In any case, thanks, Bernd. It is ok to commit this patch.