On Mon, Jul 27, 2015 at 10:52:58AM +0100, pins...@gmail.com wrote: > > On Jul 27, 2015, at 2:26 AM, Jiong Wang <jiong.w...@arm.com> wrote: > > > > Andrew Pinski writes: > > > >>> On Fri, Jul 24, 2015 at 2:07 AM, Jiong Wang <jiong.w...@arm.com> wrote: > >>> > >>> James Greenhalgh writes: > >>> > >>>>> On Wed, May 20, 2015 at 01:35:41PM +0100, Jiong Wang wrote: > >>>>> Current IRA still use both target macros in a few places. > >>>>> > >>>>> Tell IRA to use the order we defined rather than with it's own cost > >>>>> calculation. Allocate caller saved first, then callee saved. > >>>>> > >>>>> This is especially useful for LR/x30, as it's free to allocate and is > >>>>> pure caller saved when used in leaf function. > >>>>> > >>>>> Haven't noticed significant impact on benchmarks, but by grepping some > >>>>> keywords like "Spilling", "Push.*spill" etc in ira rtl dump, the number > >>>>> is smaller. > >>>>> > >>>>> OK for trunk? > >>>> > >>>> OK, sorry for the delay. > >>>> > >>>> It might be mail client mangling, but please check that the trailing > >>>> slashes > >>>> line up in the version that gets committed. > >>>> > >>>> Thanks, > >>>> James > >>>> > >>>>> 2015-05-19 Jiong. Wang <jiong.w...@arm.com> > >>>>> > >>>>> gcc/ > >>>>> PR 63521 > >>>>> * config/aarch64/aarch64.h (REG_ALLOC_ORDER): Define. > >>>>> (HONOR_REG_ALLOC_ORDER): Define. > >>> > >>> Patch reverted. > >> > >> I did not see a reason why this patch was reverted. Maybe I am > >> missing an email or something. > > > > There are several execution regressions under gcc testsuite, although as > > far as I can see it's this patch exposed hidding bugs in those > > testcases, but there might be one other issue, so to be conservative, I > > temporarily reverted this patch. > > If you are talking about: > gcc.target/aarch64/aapcs64/func-ret-2.c execution > Etc. > > These test cases are too dependent on the original register allocation order > and really can be safely ignored. Really these three tests should be moved or > written in a more sane way.
Yup, completely agreed - but the testcases do throw up something interesting. If we are allocating registers to hold 128-bit values, and we pick x7 as highest preference, we implicitly allocate x8 along with it. I think we probably see the same thing if the first thing we do in a function is a structure copy through a back-end expanded movmem, which will likely begin with a 128-bit LDP using x7, x8. If the argument for this patch is that we prefer to allocate x7-x0 first, followed by x8, then we've potentially made a sub-optimal decision, our allocation order for 128-bit values is x7,x8,x5,x6 etc. My hunch is that we *might* get better code generation in this corner case out of some permutation of the allocation order for argument registers. I'm thinking something along the lines of {x6, x5, x4, x7, x3, x2, x1, x0, x8, ... } I asked Jiong to take a look at that, and I agree with his decision to reduce the churn on trunk and just revert the patch until we've come to a conclusion based on some evidence - rather than just my hunch! I agree that it would be harmless on trunk from a testing point of view, but I think Jiong is right to revert the patch until we better understand the code-generation implications. Of course, it might be that I am completely wrong! If you've already taken a look at using a register allocation order like the example I gave and have something to share, I'd be happy to read your advice! Thanks, James