Re: Out of Order and Superscalar - small experiment

Robin Vowels Tue, 03 Jun 2014 00:19:23 -0700

From: "Tony Harminc" <[email protected]>
Sent: Tuesday, June 03, 2014 11:52 AM



On 2 June 2014 20:14, Robin Vowels <[email protected]> wrote:

From: "Rob van der Heij" <[email protected]>
Sent: Tuesday, June 03, 2014 1:00 AM

My simplistic implementation was like this (for each byte, so wrapped in a
loop)

*  IC        R4,0(R6)  AR        R2,R4     AR        R3,R2   *

Must have missed something here.

I think what you missed was the reference to the Adler-32 algorithm,


No.  My remark was based on the code shown.

with its need to keep two 16-bit sums.


That's irrelvant.

A 3-instruction loop to sum bytes.

        LA 6,X+offset (last byte of area to be summed)
        SR 2,2
        SR 4,4
Loop IC 4,0(0,6)
        AR 2,4
        BCT 6,Loop


And you can use BCTR to save a few µS.

Why do you think BCTR would save such a large amount of time?


I didn't say "large", but you will be interested to know that with
BCTR, the branch address is kept in the second register,
(which is loaded once, prior to entering the loop)
and that therefore BCTR executes faster than BCT.

Perhaps you're again talking about old machines. Surely BRCT/JCT would be the
time saver on a current machine if there is one for this case.


BCTR runs just as well on current machines.


---
This email is free from viruses and malware because avast! Antivirus protection 
is active.
http://www.avast.com

Re: Out of Order and Superscalar - small experiment

Reply via email to