Hi

I would change the last two STG instructions - first R1 and then R0,
because R0 is the last reg used by a RISBG instruction.

Tobias

DO    FROM=(R2,)
  LMG   R0,R1,0(,R4)      Load 16 source bytes
  LA    R4,16(,R4)
  RISBG R0,R0,06,11,02    aaabbbb0ccc0ddd0eee0fff0ggg0hhh0
  RISBG R1,R1,06,11,02    aaabbbb0ccc0ddd0eee0fff0ggg0hhh0
  RISBG R0,R0,12,17,04    aaabbbccccc0ddd0eee0fff0ggg0hhh0
  RISBG R1,R1,12,17,04    aaabbbccccc0ddd0eee0fff0ggg0hhh0
  RISBG R0,R0,18,23,06    aaabbbcccdddddd0eee0fff0ggg0hhh0
  RISBG R1,R1,18,23,06    aaabbbcccdddddd0eee0fff0ggg0hhh0
  RISBG R0,R0,24,29,08    aaabbbcccdddeee0eee0fff0ggg0hhh0
  RISBG R1,R1,24,29,08    aaabbbcccdddeee0eee0fff0ggg0hhh0
  RISBG R0,R0,30,35,10    aaabbbcccdddeeefffe0fff0ggg0hhh0
  RISBG R1,R1,36,41,12    aaabbbcccdddeeefffgggff0ggg0hhh0
  RISBG R0,R0,36,41,12    aaabbbcccdddeeefffgggff0ggg0hhh0
  RISBG R1,R1,42,47,14    aaabbbcccdddeeefffggghhhggg0hhh0
  RISBG R0,R0,42,47,14    aaabbbcccdddeeefffggghhhggg0hhh0
  STG   R0,0(R6)          Store 6 databytes
  STG   R1,6(R6)          Store 6 databytes
  LA    R6,12(,R6)
ENDDO
This requires only half the number of iterations and have less pipeline
dependencies.

I was quite surprised to find out that the second version takes about as
long as the first version (in some runs it even appears to be a liitle
bit slower). Have I introduced some weird dependency between the loads
or two stores or R0 and R1 that negates everything? Any suggestions?

Reply via email to