On Wed, Oct 1, 2014 at 5:06 AM, Mike Stump <mikest...@comcast.net> wrote: > On Sep 30, 2014, at 2:22 AM, Bin Cheng <bin.ch...@arm.com> wrote: >> Then I decided to take one step forward to introduce a generic >> instruction fusion infrastructure in GCC, because in essence, load/store >> pair is nothing different with other instruction fusion, all these >> optimizations >> want is to push instructions together in instruction flow. > > I like the step you took. I had exactly this in mind when I wrote the > original. > >> N0 ~= 1300 >> N1/N2 ~= 5000 >> N3 ~= 7500 > > Nice. Would be nice to see metrics for time to ensure that the code isn't > actually worse (CSiBE and/or spec and/or some other). I didn't have any > large scale benchmark runs with my code and I did worry about extending > lifetimes and register pressure.
Hi Mike, I did collect spec2k performance after pairing load/store using this patch on both aarch64 and cortex-a15. The performance is improved obviously, especially on cortex-a57. There are some (though not many) benchmarks are regressed a little. There is no register pressure problem here because this pass is put between register allocation and sched2, I guess sched2 should resolve most pipeline hazards introduced by this pass. > >> I cleared up Mike's patch and fixed some implementation bugs in it > > So, I'm wondering what the bugs or missed opportunities were? And, if they > were of the type of problem that generated incorrect code or if they were of > the type that was merely a missed opportunity. Just missed opportunity issues. Thanks, bin