On 2 June 2014 19:30, Tony Harminc <[email protected]> wrote:

> Is LHI Rn,0 faster than SR Rn,Rn? I'd expect them to be the same, but
> SR is half the size, and so lessens the amount of i-cache used.
>

The effect of the footprint is a challenge to measure, but that would also
vote against unrolling the loop for 16 byte. But apart from that, replacing
the LHI Rx,0 with SR Rx,Rx made it a bit slower (and LA is just the same as
LHI despite what others hinted). And I notice that the DR is not that
expensive, so maybe avoiding the division by frequent compare and subtract
might not be a clear win either.

Reply via email to