FWIW The 'latest' timing figures I had were (from Amdahl or Circle
Computer Group, then from IBM):
* 1995 Hitachi Skyline (bipolar): RR operations = 3ns; cached SS
ops = 6-10ns; non-cached SS ops = 60-80ns.
* 2000+ IBM (CMOS): RR ops = 20ns.
The 1972 System/370 Model 145 might have done it in microseconds, but
current processors do it in nanoseconds.
It is the instruction cache faults and SS ops that degrade performance.
RR instructions are on average 20 times faster than SS ones: registers
are hard-wired, VS is not.
You can try something like the following to check whether SLR/SR is
faster than XR - by coding SR's first, then XR's ... and checking for
any 'average' CPU time differences between runs:
TITLE 'LOOPTEST: CHECK ''S(L)R'' VS ''XR'''
PRINT ON,GEN
*
*---------------------------------------------------------------------*
* PROGRAM TO CHECK CPU DIFFERENCES : 'S(L)R' VS 'XR' *
*---------------------------------------------------------------------*
*
LOOPTEST CSECT START CONTROL SECTION
EQUREGS EQUATE REGISTERS
*
BEGIN STM R14,R12,12(R13) SAVE REGISTERS 14->12
LR R2,R15 R2 <-- EP
USING LOOPTEST,R2 SAY SO
ST R13,SAVEBLK+4 BACKWARD POINTER
LR R14,R13 COPY CALLER'S R13
LA R13,SAVEBLK R13 <-- MY SAVEAREA
ST R13,8(,R14) FORWARD POINTER
*
LHI R4,X'7FFF' R4 <-- 32767
B BIGLOOP FORCE INSTRUCTION PREFETCH
DC 64F'0' INSTRUCTION CACHE FILLER
*
BIGLOOP DS 0H OUTER LOOP
LHI R5,X'7FFF' R5 <-- 32767
*
LITLOOP DS 0H INNER LOOP
SR R12,R12 'SR' CHECK (EITHER DO THIS ...
* XR R12,R12 'XR' CHECK ... OR ELSE DO THAT)
LHI R12,X'7FFF' PUT SOMETHING BACK IN R12
*
SKIPFILL DS 0H LOOP UNTIL DONE
BCT R5,LITLOOP DO INNER LOOP
BCT R4,BIGLOOP DO OUTER LOOP
*
RETURN DS 0H EXIT
L R13,SAVEBLK+4 CALLER'S R13
LM R14,R12,12(R13) RESTORE REGISTERS
XR R15,R15 CLEAR RETURN CODE
BSM 0,R14 BACK TO CALLER
*
SAVEBLK DC 18F'0' 18 FULLWORDS SAVEAREA INIT F'0'
*
END BEGIN START FROM BEGIN ONWARDS
My ha'pennyworth.
John Gilmore wrote:
The fact that the System/370 Model 145 instruction timings are later
than those for the System/360 does not enhance their value. Some
simple souls judge that later is always better; but, while this is
certainly true for a loaf of bread, the inference from such examples
to current instruction timings is faulty.
Special casing/optimizing is pervasive in the imnplementations of
z/Architecture instructions; and here in particular the two cases
| SR Ri,Ri
| XR Ri,Ri
and
| SR Ri,Rj i ¬= j
| XR Ri,Rj i ¬= j
are trivially easy to distinguish at the hardware level. There is
indeed evidence, very persuasive but not unfortunately conclusive
evidence, that the two register-zeroing special cases are identified
and optimized on some and perhaps all z/Architecture models.
Moreover, while reading things into other people's posts is always a
perilous undertaking, it seems to me that some authoritative posts in
this thread have come about as close to saying this as the proprieties
involved make possible.
John Gilmore, Ashland, MA 01721 - USA
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN