Andrew Pinski writes:

> On Mon, Jul 27, 2015 at 3:36 AM, James Greenhalgh
> <james.greenha...@arm.com> wrote:
>> On Mon, Jul 27, 2015 at 10:52:58AM +0100, pins...@gmail.com wrote:
>>> > On Jul 27, 2015, at 2:26 AM, Jiong Wang <jiong.w...@arm.com> wrote:
>>> >
>>> > Andrew Pinski writes:
>>> >
>>> >>> On Fri, Jul 24, 2015 at 2:07 AM, Jiong Wang <jiong.w...@arm.com> wrote:
>>> >>>
>>> >>> James Greenhalgh writes:
>>> >>>
>>> >>>>> On Wed, May 20, 2015 at 01:35:41PM +0100, Jiong Wang wrote:
>>> >>>>> Current IRA still use both target macros in a few places.
>>> >>>>>
>>> >>>>> Tell IRA to use the order we defined rather than with it's own cost
>>> >>>>> calculation. Allocate caller saved first, then callee saved.
>>> >>>>>
>>> >>>>> This is especially useful for LR/x30, as it's free to allocate and is
>>> >>>>> pure caller saved when used in leaf function.
>>> >>>>>
>>> >>>>> Haven't noticed significant impact on benchmarks, but by grepping some
>>> >>>>> keywords like "Spilling", "Push.*spill" etc in ira rtl dump, the 
>>> >>>>> number
>>> >>>>> is smaller.
>>> >>>>>
>>> >>>>> OK for trunk?
>>> >>>>
>>> >>>> OK, sorry for the delay.
>>> >>>>
>>> >>>> It might be mail client mangling, but please check that the trailing 
>>> >>>> slashes
>>> >>>> line up in the version that gets committed.
>>> >>>>
>>> >>>> Thanks,
>>> >>>> James
>>> >>>>
>>> >>>>> 2015-05-19  Jiong. Wang  <jiong.w...@arm.com>
>>> >>>>>
>>> >>>>> gcc/
>>> >>>>>  PR 63521
>>> >>>>>  * config/aarch64/aarch64.h (REG_ALLOC_ORDER): Define.
>>> >>>>>  (HONOR_REG_ALLOC_ORDER): Define.
>>> >>>
>>> >>> Patch reverted.
>>> >>
>>> >> I did not see a reason why this patch was reverted.  Maybe I am
>>> >> missing an email or something.
>>> >
>>> > There are several execution regressions under gcc testsuite, although as
>>> > far as I can see it's this patch exposed hidding bugs in those
>>> > testcases, but there might be one other issue, so to be conservative, I
>>> > temporarily reverted this patch.
>>>
>>> If you are talking about:
>>> gcc.target/aarch64/aapcs64/func-ret-2.c execution
>>> Etc.
>>>
>>> These test cases are too dependent on the original register allocation order
>>> and really can be safely ignored. Really these three tests should be moved 
>>> or
>>> written in a more sane way.
>>
>> Yup, completely agreed - but the testcases do throw up something
>> interesting. If we are allocating registers to hold 128-bit values, and
>> we pick x7 as highest preference, we implicitly allocate x8 along with it.
>> I think we probably see the same thing if the first thing we do in a
>> function is a structure copy through a back-end expanded movmem, which
>> will likely begin with a 128-bit LDP using x7, x8.
>>
>> If the argument for this patch is that we prefer to allocate x7-x0 first,
>> followed by x8, then we've potentially made a sub-optimal decision, our
>> allocation order for 128-bit values is x7,x8,x5,x6 etc.
>>
>> My hunch is that we *might* get better code generation in this corner case
>> out of some permutation of the allocation order for argument
>> registers. I'm thinking something along the lines of
>>
>>   {x6, x5, x4, x7, x3, x2, x1, x0, x8, ... }
>>
>> I asked Jiong to take a look at that, and I agree with his decision to
>> reduce the churn on trunk and just revert the patch until we've come to
>> a conclusion based on some evidence - rather than just my hunch! I agree
>> that it would be harmless on trunk from a testing point of view, but I
>> think Jiong is right to revert the patch until we better understand the
>> code-generation implications.
>>
>> Of course, it might be that I am completely wrong! If you've already taken
>> a look at using a register allocation order like the example I gave and
>> have something to share, I'd be happy to read your advice!
>
> Any news on this patch?  It has been a year since it was reverted for
> a bad test that was failing.

Hi Andrew,

  Yeah, those tests actually are expected to fail once register
  allocation order changed, it's clearly documented in the comments:

  gcc.target/aarch64/aapcs64/abitest-2.h:
  
/* ...

   Note that for value that is returned in the caller-allocated memory
   block, we get the address from the saved x8 register.  x8 is saved
   just after the callee is returned; we assume that x8 has not been
   clobbered at then, although there is no requirement for the callee
   preserve the value stored in x8.  Luckily, all test cases here are
   simple enough that x8 doesn't normally get clobbered (although not
   guaranteed).  */

  I had a local fix which use the redundant value returned to x0 to
  repair the clobbered value in x8 as they will be identical for
  structure type return, however that trick can't play anymore as we
  recently defined TARGET_OMIT_STRUCT_RETURN_REG to true which will
  remove that redundant x8 to x0 copy.
  
  Anyway, I will come back with some benchmark results of this patch on
  top of latest trunk after the weekend run, and also answers to Jame's
  concerns.

-- 
Regards,
Jiong

Reply via email to