<snip>
I know "it's the one not taken!". But of the B, J, or BR, can they be ordered? 
I 
am 99.9% certain that having the branch address in a register is the fastest. 
But is it significant enough that I should dedicate a register for it? I'm 
asking 
because I'm reoptimizing some code which is very heavily used. So heavy, 
that a 1% improvement is worth while. I am currenly holding 4 different 
addresses in registers to speed up branch processing in my main loop.  This 
loop is basically compressing blanks using a primitive RLE algorithm.
</snip>

I ran a number of tests on our z9, but after a few tries, decided the choice 
between B, J and BR is trivial. My tests ran through 12 different strings 1 
million times, and ran each test 5 times. Using simple CLI, BE, LA, BCT 
searches was much slower than the SRST instruction. Like the CLCL you are 
using, SRST scans the data for you in one instruction, and you don't have to 
manage your own loop. That cut 50% off the reported service units in my 
tests. When I changed from BE and BNE to JE and JNE, I actually increased 
service units by about 0.1 percent. When I changed to BER and BNER, I also 
showed a slight increase - probably because I had to prepare the destination 
register each time though the million trials. 

Using TRT to find data doubled the service units. Using TR to mainpulate the 
data so a second SRST could find the non-blanks increased the times by more 
than double. 

Conclusion: Change the scan-for-blank loop to use SRST. You should get a lot 
of mileage out of that one change.

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Reply via email to