Re: Out of Order and Superscalar - small experiment

Robin Vowels Tue, 03 Jun 2014 03:04:28 -0700

From: "Tony Harminc" <[email protected]>
Sent: Tuesday, June 03, 2014 11:52 AM

On 2 June 2014 20:14, Robin Vowels <[email protected]> wrote:

From: "Rob van der Heij" <[email protected]>
Sent: Tuesday, June 03, 2014 1:00 AM

More recently I've been working on porting Linux gcc object code to CMS,
and now that I needed a nice checksum routine, I figured I might take a
popular open source checksum routine http://en.wikipedia.org/wiki/Adler-32
and let gcc compile and optimize it. Since the generated assembler source
wasn't that obvious to me, I was getting interested to know why.

My simplistic implementation was like this (for each byte, so wrapped in a
loop)

*  IC        R4,0(R6)  AR        R2,R4     AR        R3,R2   *


Must have muissed something here.

A 3-instruction loop to sum bytes.

        LA 6,X+offset (last byte of area to be summed)
        SR 2,2
        SR 4,4
Loop IC 4,0(0,6)
        AR 2,4
        BCT 6,Loop

And you can use BCTR to save a few µS.

Why do you think BCTR would save such a large amount of time?


And while I have the manual out, BCTR is from 40% to 70% faster than BCT.


---
This email is free from viruses and malware because avast! Antivirus protection 
is active.
http://www.avast.com

Re: Out of Order and Superscalar - small experiment

Reply via email to