BTW Change "LHI 4,4096*512" and "LHI 5,4096*1024" to something like "LHI 4,4096*16-1"etc. or it will not fit in a halfword - just wrote it "off the top of my head" without checking. :-(
On 14/08/2019 03:24, CM Poncelet wrote: > FWIW > > On the Hitachi Skyline bipolar mainframe (from 1995), the instruction > processor speeds were: > - RR: 3ns. > - SS: 6-10ns if L2 cached, else 60-80ns if data-fetched from central > storage. > > On IBM's CMOS G4 mainframes: > - RR: 20ns approx. > - SS: no idea, did not check. > > IBM said it had 'improved' its CMOS processors since then, but I do not > know whether this includes 'improved' RR times. > > As regards loops, try to ensure that they fit entirely within the > instruction register (was 128 bytes) to avoid instruction cache faults. > > E.g. try something like the following (adjust as required) to see what > happens with instruction cache faults: > WHATEVER CSECT OUR ENTRY POINT > USING WHATEVER,15 USE R15 AS OUR BASE REGISTER > INIT STM 14,12,12(13) STORE CALLER'S REGISTERS > ST 13,SAVEAREA+4 OUR BACKWARD POINTER > LR 4,13 USE R4 AS TEMP CALLER'S R13 > LA 13,SAVEAREA USE R13 FOR OUR SAVEAREA > ST 13,8(4) CALLER'S FORWARD POINTER > LHI 4,4096*512 USE R4 AS OUTER LOOP COUNTER > * > OUTLOOP CNOP 0,4 OUTER LOOP START > LHI 5,4096*1024 USE R5 AS INNER LOOP COUNTER > * > INLOOP CNOP 0,4 INNER LOOP START > B CONTINUE BRANCH PAST INSTRUCTION CACHE STORAGE > IN DS F STORAGE INSIDE INSTRUCTION CACHE > CONTINUE DS 0F NOW CARRY ON > ST 5,OUT STORE R5 OUTSIDE THE INSTRUCTION > CACHE <-- > * ST 5,IN STORE R5 INSIDE THE INSTRUCTION > CACHE <-- > BCT 5,INLOOP DECREMENT AND BACK TO INNER LOOP > BCT 4,OUTLOOP DECREMENT AND BACK TO OUTER LOOP > FILL DS XL(132+WHATEVER-*) FILL UP THE INSTRUCTION CACHE > OUT DS F STORAGE OUTSIDE INSTRUCTION CACHE > SAVEAREA DS 18F OUR REGISTER SAVE AREA > * > EXIT L 13,4(13) RESTORE CALLER'S REGISTERS > LM 14,12,12(13) > BR 14 BACK TO CALLER > END WHATEVER START AT WHATEVER'S EP > > An above "ST 5,IN" causes an instruction cache fault at every INLOOP > iteration, which results in approx 20 times more CPU than the "ST 5,OUT". > > When possible use registers instead of L2 cache for temporary storage, > because registers (RR) are much faster than storage to storage > operations (SS). > > Avoid MVCL: use MVC iteratively in a loop. > > HTH. > > Cheers, Chris Poncelet (retired sysprog) > > > > > > > On 13/08/2019 16:39, Steve Smith wrote: >> Write good code and forget about instruction timings. With any luck your >> code will have to perform on several generations of architecture and >> machines. >> >> There's a big difference between B- (base-index-displacement) branches and >> J- (or BR-) (relative address) instructions. Surely by now, this should go >> without saying. Regardless of whether they're "faster" or not, they are >> much better, and as that is well-documented, I won't belabor it. >> >> sas >> >> ---------------------------------------------------------------------- >> For IBM-MAIN subscribe / signoff / archive access instructions, >> send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN >> . >> > > ---------------------------------------------------------------------- > For IBM-MAIN subscribe / signoff / archive access instructions, > send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN > . > ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN