This is a small part of code from a routine that decodes Base64 encoded data: DO FROM=(R2,) LG R0,0(,R4) Load 8 source bytes LA R4,8(,R4) R4 past 8 source bytes RISBG R0,R0,06,11,02 aaabbbb0ccc0ddd0eee0fff0ggg0hhh0 RISBG R0,R0,12,17,04 aaabbbccccc0ddd0eee0fff0ggg0hhh0 RISBG R0,R0,18,23,06 aaabbbcccdddddd0eee0fff0ggg0hhh0 RISBG R0,R0,24,29,08 aaabbbcccdddeee0eee0fff0ggg0hhh0 RISBG R0,R0,30,35,10 aaabbbcccdddeeefffe0fff0ggg0hhh0 RISBG R0,R0,36,41,12 aaabbbcccdddeeefffgggff0ggg0hhh0 RISBG R0,R0,42,47,14 aaabbbcccdddeeefffggghhhggg0hhh0 STG R0,0(R6) Store 6 databytes LA R6,6(,R6) R6 past 6 databytes ENDDO It takes 8 source bytes (already translated to contains 6 databits each), combines the six databits in each byte to create 6 databytes. The code contains a series of RISBG instructions that act on the same register. I figured it might be a lot faster to use two registers and interleave these operations: DO FROM=(R2,) LMG R0,R1,0(,R4) Load 16 source bytes LA R4,16(,R4) RISBG R0,R0,06,11,02 aaabbbb0ccc0ddd0eee0fff0ggg0hhh0 RISBG R1,R1,06,11,02 aaabbbb0ccc0ddd0eee0fff0ggg0hhh0 RISBG R0,R0,12,17,04 aaabbbccccc0ddd0eee0fff0ggg0hhh0 RISBG R1,R1,12,17,04 aaabbbccccc0ddd0eee0fff0ggg0hhh0 RISBG R0,R0,18,23,06 aaabbbcccdddddd0eee0fff0ggg0hhh0 RISBG R1,R1,18,23,06 aaabbbcccdddddd0eee0fff0ggg0hhh0 RISBG R0,R0,24,29,08 aaabbbcccdddeee0eee0fff0ggg0hhh0 RISBG R1,R1,24,29,08 aaabbbcccdddeee0eee0fff0ggg0hhh0 RISBG R0,R0,30,35,10 aaabbbcccdddeeefffe0fff0ggg0hhh0 RISBG R1,R1,36,41,12 aaabbbcccdddeeefffgggff0ggg0hhh0 RISBG R0,R0,36,41,12 aaabbbcccdddeeefffgggff0ggg0hhh0 RISBG R1,R1,42,47,14 aaabbbcccdddeeefffggghhhggg0hhh0 RISBG R0,R0,42,47,14 aaabbbcccdddeeefffggghhhggg0hhh0 STG R0,0(R6) Store 6 databytes STG R1,6(R6) Store 6 databytes LA R6,12(,R6) ENDDO This requires only half the number of iterations and have less pipeline dependencies.
I was quite surprised to find out that the second version takes about as long as the first version (in some runs it even appears to be a liitle bit slower). Have I introduced some weird dependency between the loads or two stores or R0 and R1 that negates everything? Any suggestions? Fred! ----------------------------------------------------------------- ATTENTION: The information in this electronic mail message is private and confidential, and only intended for the addressee. Should you receive this message by mistake, you are hereby notified that any disclosure, reproduction, distribution or use of this message is strictly prohibited. Please inform the sender by reply transmission and delete the message without copying or opening it. Messages and attachments are scanned for all viruses known. If this message contains password-protected attachments, the files have NOT been scanned for viruses by the ING mail domain. Always scan attachments before opening them. -----------------------------------------------------------------