On 10/08/14 04:27, Ramana Radhakrishnan wrote:
If the port has a splitter to rip apart a douple-word load into single-word loads, 
then we'd obviously only want to do that in cases where the double-word load 
actually generates > 1 assembly instruction.

Or indeed if it is really a performance win. And I think that should
purely be a per port / micro-architectural decision .
Agreed.

Generating more ldrd's and strd's will be beneficial in the ARM and
the AArch64 port - we save code size and start using more memory
bandwidth available per instruction on most higher end cores that I'm
aware of. Even on the smaller microcontrollers I expect it to be a win
because you've saved code size. There may well be pathological cases
given we've shortened some dependencies or increased lifetimes of
others but overall I'd expect it to be more positive than negative.
Agreed. I suspect there's multiple architectures where the results would be similar -- code size improvements, more effective use of memory bandwidth with possibly some pathological case(s) that we really shouldn't worry too much about.

I also expect this to be more effective in the T32 (Thumb2) ISA and
AArch64 because ldrd/ strd and ldp / stp respectively can work with
any registers unlike the A32 ISA where the registers loaded or stored
must be consecutive registers. I'm hoping for some more review on the
generic bits before looking into the backend implementation in the
expectation that this is the direction folks want to proceed.
I've got some questions that I'm formulating to make sure I understand how the facility is to be used. I may have to simply sit down with the code installed on a test build and play with it.

However, to be clear, I really like the direction this work has gone.

Jeff

Reply via email to