On Mon, Jul 27, 2015 at 10:52:58AM +0100, pins...@gmail.com wrote:
> > On Jul 27, 2015, at 2:26 AM, Jiong Wang <jiong.w...@arm.com> wrote:
> > 
> > Andrew Pinski writes:
> > 
> >>> On Fri, Jul 24, 2015 at 2:07 AM, Jiong Wang <jiong.w...@arm.com> wrote:
> >>> 
> >>> James Greenhalgh writes:
> >>> 
> >>>>> On Wed, May 20, 2015 at 01:35:41PM +0100, Jiong Wang wrote:
> >>>>> Current IRA still use both target macros in a few places.
> >>>>> 
> >>>>> Tell IRA to use the order we defined rather than with it's own cost
> >>>>> calculation. Allocate caller saved first, then callee saved.
> >>>>> 
> >>>>> This is especially useful for LR/x30, as it's free to allocate and is
> >>>>> pure caller saved when used in leaf function.
> >>>>> 
> >>>>> Haven't noticed significant impact on benchmarks, but by grepping some
> >>>>> keywords like "Spilling", "Push.*spill" etc in ira rtl dump, the number
> >>>>> is smaller.
> >>>>> 
> >>>>> OK for trunk?
> >>>> 
> >>>> OK, sorry for the delay.
> >>>> 
> >>>> It might be mail client mangling, but please check that the trailing 
> >>>> slashes
> >>>> line up in the version that gets committed.
> >>>> 
> >>>> Thanks,
> >>>> James
> >>>> 
> >>>>> 2015-05-19  Jiong. Wang  <jiong.w...@arm.com>
> >>>>> 
> >>>>> gcc/
> >>>>>  PR 63521
> >>>>>  * config/aarch64/aarch64.h (REG_ALLOC_ORDER): Define.
> >>>>>  (HONOR_REG_ALLOC_ORDER): Define.
> >>> 
> >>> Patch reverted.
> >> 
> >> I did not see a reason why this patch was reverted.  Maybe I am
> >> missing an email or something.
> > 
> > There are several execution regressions under gcc testsuite, although as
> > far as I can see it's this patch exposed hidding bugs in those
> > testcases, but there might be one other issue, so to be conservative, I
> > temporarily reverted this patch.
> 
> If you are talking about:
> gcc.target/aarch64/aapcs64/func-ret-2.c execution
> Etc.
> 
> These test cases are too dependent on the original register allocation order
> and really can be safely ignored. Really these three tests should be moved or
> written in a more sane way. 

Yup, completely agreed - but the testcases do throw up something
interesting. If we are allocating registers to hold 128-bit values, and
we pick x7 as highest preference, we implicitly allocate x8 along with it.
I think we probably see the same thing if the first thing we do in a
function is a structure copy through a back-end expanded movmem, which
will likely begin with a 128-bit LDP using x7, x8.

If the argument for this patch is that we prefer to allocate x7-x0 first,
followed by x8, then we've potentially made a sub-optimal decision, our
allocation order for 128-bit values is x7,x8,x5,x6 etc.

My hunch is that we *might* get better code generation in this corner case
out of some permutation of the allocation order for argument
registers. I'm thinking something along the lines of

  {x6, x5, x4, x7, x3, x2, x1, x0, x8, ... }

I asked Jiong to take a look at that, and I agree with his decision to
reduce the churn on trunk and just revert the patch until we've come to
a conclusion based on some evidence - rather than just my hunch! I agree
that it would be harmless on trunk from a testing point of view, but I
think Jiong is right to revert the patch until we better understand the
code-generation implications.

Of course, it might be that I am completely wrong! If you've already taken
a look at using a register allocation order like the example I gave and
have something to share, I'd be happy to read your advice!

Thanks,
James

Reply via email to