On 2 June 2014 19:30, Tony Harminc <[email protected]> wrote:
> Is LHI Rn,0 faster than SR Rn,Rn? I'd expect them to be the same, but > SR is half the size, and so lessens the amount of i-cache used. > The effect of the footprint is a challenge to measure, but that would also vote against unrolling the loop for 16 byte. But apart from that, replacing the LHI Rx,0 with SR Rx,Rx made it a bit slower (and LA is just the same as LHI despite what others hinted). And I notice that the DR is not that expensive, so maybe avoiding the division by frequent compare and subtract might not be a clear win either.
