Re: Pipeline question

Fred van der Windt Sun, 14 Aug 2011 23:17:47 -0700

> I need 27 instructions (maybe more because of two JO after TROT) see end of 
> this post.
>
> But I am sure this code is faster than what was used before. The simple 
> relation 256 input bytes processed by 27 instructions to create 192 bytes -
> should speed up the whole precess by a factor (even if the TROT is slowing 
> stuff down).


I uploaded your code and tested it last weekend (lovely to have a the lpar all 
to yourself). It seems you where somewhat overly optimistic about the effects 
of 'using only 27 instructions to process 256 input bytes'. The code is more 
than 4 times slower than the original 'process 8 bytes using RISBG' variant.

I think the memory accesses are killing you. The original code does one TROO to 
convert the Base64 code to bytes with a 0...63 value and then every source byte 
is read once and every result byte is written once. Assuming a translate 
instruction does 2 reads and 1 write this results in 3 reads and 1.75 writes 
for every source byte. Your code accesses memory a lot more. I tried a quick 
tally:

                                        Access
  MVC   INTER,TR1                       256*RW
  TR    INTER,INPUT             256*RRW
  MVC   INTER2,TR2              256*RW
  TR    INTER2,INTER-1          256*RRW
  MVC   RESULT,INTER2           192*RW
  NC    RESULT,TR31+2           192*RRW
  NC    INTER2,TR31             192*RRW
  TR    INTER2,TR32             256*RRW
  OC    RESULT,INTER2           192*RRW
  LA    R1,TR4
  LA    R14,INTER
  LA    R15,64
  LA    R2,INTER+128
  TROT  R14,R2,1                        64*RRRWW
  JO   *-4
  MVC   INTER2,TR6              256*RW
  TR    INTER2,INTER-1          256*RRW
  OC    RESULT,INTER2           256*RRW
  LA    R1,TR5
  LA    R14,INTER
  LA    R15,64
  LA    R2,INTER+192
  TROT  R14,R2,1                        64*RRRWW
  JO   *-4
  MVC   INTER2,TR6-1            256*RW
  TR    INTER2,INTER-1          256*RRW
  OC    RESULT,INTER2           192*RRW

6208 bytes read and 3776 bytes read to process 256 bytes means 24.25 reads and 
14.75 writes for every source byte (versus 3 reads and 1.75 writes).

We've seen simple 'rules of thumb' on writing faster code before in this group. 
Maybe we should add something about 'minimizing memory access':

1. Select the best algorithm
2. Minimize the number of memory accesses
3. Minimize the number of instructions

We also had something about using 'simple' versus 'complex' (millicoded) 
instructions but that also tends to invoke religious discussions about 
maintainability and such...

Fred!

-----------------------------------------------------------------
ATTENTION:
The information in this electronic mail message is private and
confidential, and only intended for the addressee. Should you
receive this message by mistake, you are hereby notified that
any disclosure, reproduction, distribution or use of this
message is strictly prohibited. Please inform the sender by
reply transmission and delete the message without copying or
opening it.

Messages and attachments are scanned for all viruses known.
If this message contains password-protected attachments, the
files have NOT been scanned for viruses by the ING mail domain.
Always scan attachments before opening them.
-----------------------------------------------------------------

Re: Pipeline question

Reply via email to