Re: [PATCH RFC]Pair load store instructions using a generic scheduling fusion pass

Jeff Law Wed, 08 Oct 2014 15:22:35 -0700

On 10/08/14 04:27, Ramana Radhakrishnan wrote:

If the port has a splitter to rip apart a douple-word load into single-word loads, 
then we'd obviously only want to do that in cases where the double-word load 
actually generates > 1 assembly instruction.


Or indeed if it is really a performance win. And I think that should
purely be a per port / micro-architectural decision .

Agreed.

Generating more ldrd's and strd's will be beneficial in the ARM and
the AArch64 port - we save code size and start using more memory
bandwidth available per instruction on most higher end cores that I'm
aware of. Even on the smaller microcontrollers I expect it to be a win
because you've saved code size. There may well be pathological cases
given we've shortened some dependencies or increased lifetimes of
others but overall I'd expect it to be more positive than negative.

Agreed. I suspect there's multiple architectures where the resultswould be similar -- code size improvements, more effective use of memorybandwidth with possibly some pathological case(s) that we reallyshouldn't worry too much about.

I also expect this to be more effective in the T32 (Thumb2) ISA and
AArch64 because ldrd/ strd and ldp / stp respectively can work with
any registers unlike the A32 ISA where the registers loaded or stored
must be consecutive registers. I'm hoping for some more review on the
generic bits before looking into the backend implementation in the
expectation that this is the direction folks want to proceed.

I've got some questions that I'm formulating to make sure I understandhow the facility is to be used. I may have to simply sit down with thecode installed on a test build and play with it.


However, to be clear, I really like the direction this work has gone.

Jeff

Re: [PATCH RFC]Pair load store instructions using a generic scheduling fusion pass

Reply via email to