On Wed, Oct 1, 2014 at 5:06 AM, Mike Stump <mikest...@comcast.net> wrote:
> On Sep 30, 2014, at 2:22 AM, Bin Cheng <bin.ch...@arm.com> wrote:
>> Then I decided to take one step forward to introduce a generic
>> instruction fusion infrastructure in GCC, because in essence, load/store
>> pair is nothing different with other instruction fusion, all these 
>> optimizations
>> want is to push instructions together in instruction flow.
>
> I like the step you took.  I had exactly this in mind when I wrote the 
> original.
>
>> N0 ~= 1300
>> N1/N2 ~= 5000
>> N3 ~= 7500
>
> Nice.  Would be nice to see metrics for time to ensure that the code isn't 
> actually worse (CSiBE and/or spec and/or some other).  I didn't have any 
> large scale benchmark runs with my code and I did worry about extending 
> lifetimes and register pressure.

Hi Mike,
I did collect spec2k performance after pairing load/store using this
patch on both aarch64 and cortex-a15.  The performance is improved
obviously, especially on cortex-a57.  There are some (though not many)
benchmarks are regressed a little.  There is no register pressure
problem here because this pass is put between register allocation and
sched2, I guess sched2 should resolve most pipeline hazards introduced
by this pass.

>
>> I cleared up Mike's patch and fixed some implementation bugs in it
>
> So, I'm wondering what the bugs or missed opportunities were?  And, if they 
> were of the type of problem that generated incorrect code or if they were of 
> the type that was merely a missed opportunity.
Just missed opportunity issues.

Thanks,
bin

Reply via email to