On 2 June 2014 11:00, Rob van der Heij <[email protected]> wrote: > The optimized gcc code was more like this (for 3 bytes) > > * IC R2,1(R5) LHI R3,0 AR R2,R1 > AR R0,R1 IC R3,2(R5) LHI > R1,0 AR R3,R2 AR R0,R2 > IC R1,3(R5) LHI R2,0 AR R1,R3 > AR R0,R3 * > As I understand, the LHI is done earlier in the stream to allow overlap > with the other instructions.
Is LHI Rn,0 faster than SR Rn,Rn? I'd expect them to be the same, but SR is half the size, and so lessens the amount of i-cache used. Tony H.
