Re: Instruction speeds

CM Poncelet Tue, 13 Aug 2019 19:23:57 -0700

FWIW
 
On the Hitachi Skyline bipolar mainframe (from 1995), the instruction
processor speeds were:
- RR: 3ns.
- SS: 6-10ns if L2 cached, else 60-80ns if data-fetched from central
storage.
 
On IBM's CMOS G4 mainframes:
- RR: 20ns approx.
- SS: no idea, did not check.
 
IBM said it had 'improved' its CMOS processors since then, but I do not
know whether this includes 'improved' RR times.
 
As regards loops, try to ensure that they fit entirely within the
instruction register (was 128 bytes) to avoid instruction cache faults. 
 
E.g. try something like the following (adjust as required) to see what
happens with instruction cache faults:
WHATEVER CSECT                     OUR ENTRY POINT
         USING WHATEVER,15         USE R15 AS OUR BASE REGISTER
INIT     STM   14,12,12(13)        STORE CALLER'S REGISTERS
         ST    13,SAVEAREA+4       OUR BACKWARD POINTER
         LR    4,13                USE R4 AS TEMP CALLER'S R13
         LA    13,SAVEAREA         USE R13 FOR OUR SAVEAREA
         ST    13,8(4)             CALLER'S FORWARD POINTER
         LHI   4,4096*512          USE R4 AS OUTER LOOP COUNTER
*
OUTLOOP  CNOP  0,4                 OUTER LOOP START
         LHI   5,4096*1024         USE R5 AS INNER LOOP COUNTER
*
INLOOP   CNOP  0,4                 INNER LOOP START
B        CONTINUE                  BRANCH PAST INSTRUCTION CACHE STORAGE
IN       DS    F                   STORAGE INSIDE INSTRUCTION CACHE
CONTINUE DS    0F                  NOW CARRY ON
         ST    5,OUT               STORE R5 OUTSIDE THE INSTRUCTION
CACHE  <--
*        ST    5,IN                STORE R5 INSIDE  THE INSTRUCTION
CACHE  <--
         BCT   5,INLOOP            DECREMENT AND BACK TO INNER LOOP
         BCT   4,OUTLOOP           DECREMENT AND BACK TO OUTER LOOP
FILL     DS    XL(132+WHATEVER-*)  FILL UP THE INSTRUCTION CACHE
OUT      DS    F                   STORAGE OUTSIDE INSTRUCTION CACHE
SAVEAREA DS    18F                 OUR REGISTER SAVE AREA
*
EXIT     L     13,4(13)            RESTORE CALLER'S REGISTERS
         LM    14,12,12(13)
         BR    14                  BACK TO CALLER
         END   WHATEVER            START AT WHATEVER'S EP


An above "ST   5,IN" causes an instruction cache fault at every INLOOP
iteration, which results in approx 20 times more CPU than the "ST   5,OUT".
 
When possible use registers instead of L2 cache for temporary storage,
because registers (RR) are much faster than storage to storage
operations (SS).
 
Avoid MVCL: use MVC iteratively in a loop.
 
HTH. 
 
Cheers, Chris Poncelet (retired sysprog)

 
 
 


On 13/08/2019 16:39, Steve Smith wrote:
> Write good code and forget about instruction timings.  With any luck your
> code will have to perform on several generations of architecture and
> machines.
>
> There's a big difference between B- (base-index-displacement) branches and
> J- (or BR-) (relative address) instructions.  Surely by now, this should go
> without saying.  Regardless of whether they're "faster" or not, they are
> much better, and as that is well-documented, I won't belabor it.
>
> sas
>
> ----------------------------------------------------------------------
> For IBM-MAIN subscribe / signoff / archive access instructions,
> send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
> .
>


----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Re: Instruction speeds

Reply via email to