From: "Tony Harminc" <[email protected]>
Sent: Tuesday, June 03, 2014 3:30 AM


On 2 June 2014 11:00, Rob van der Heij <[email protected]> wrote:
The optimized gcc code was more like this (for 3 bytes)

*    IC    R2,1(R5)            LHI   R3,0                AR    R2,R1
              AR    R0,R1               IC    R3,2(R5)            LHI
  R1,0                AR    R3,R2               AR    R0,R2
   IC    R1,3(R5)            LHI   R2,0                AR    R1,R3
              AR    R0,R3           *
As I understand, the LHI is done earlier in the stream to allow overlap
with the other instructions.

Is LHI Rn,0 faster than SR Rn,Rn? I'd expect them to be the same, but
SR is half the size, and so lessens the amount of i-cache used.

XR Rn,Rn is faster than SR.
But does it matter?
Such an instruction should be executed only once, and once only.
It shouldn't be in the loop.

---
This email is free from viruses and malware because avast! Antivirus protection 
is active.
http://www.avast.com

Reply via email to