On 2 June 2014 11:00, Rob van der Heij <[email protected]> wrote:
> The optimized gcc code was more like this (for 3 bytes)
>
> *    IC    R2,1(R5)            LHI   R3,0                AR    R2,R1
>               AR    R0,R1               IC    R3,2(R5)            LHI
>   R1,0                AR    R3,R2               AR    R0,R2
>    IC    R1,3(R5)            LHI   R2,0                AR    R1,R3
>               AR    R0,R3           *
> As I understand, the LHI is done earlier in the stream to allow overlap
> with the other instructions.

Is LHI Rn,0 faster than SR Rn,Rn? I'd expect them to be the same, but
SR is half the size, and so lessens the amount of i-cache used.

Tony H.

Reply via email to