Amen. (And for once slightly interesting). Maybe post this to IBM-MAIN too?

-----Ursprüngliche Nachricht-----
Von: IBM Mainframe Assembler List [mailto:[email protected]] Im 
Auftrag von David Bond
Gesendet: Mittwoch, 4. Juni 2014 09:02
An: [email protected]
Betreff: Re: XR vs SR

Anyone who thinks that the S/360 instruction timings have any relevance to
how machines work today has no understanding of the last several decades of
processor design. Yes, simple instructions generally execute faster than
more complex instructions. But even that rule of thumb is overshadowed by
pipeline stalls caused by register dependencies, Address Generation
Interlock, address translation, cache effects, branch prediction and other
things.

In the specific case or XR vs SR, both are the same speed on probably any
machine since 1975. But neither can be faster than LHI because XR and SR set
the condition code and LHI does not. Setting the condition code is a
separate suboperation. The difference in the length of XR and SR vs LHI does
not make up for the fact that XR and SR are more complex than LHI.
Furthermore if i-cache has any measurable effect, then alignment or
misalignment of blocks of instructions to i-cache boundaries will almost
always have a bigger effect than individual instruction length.

And while LA can be used in many cases where LHI can and are the same
length, at least on some recent machines there was a difference in speed
that depended on how the register was subsequently used. On those machines,
the results of LA can be fed into address generation faster than the results
of a load operation such as LHI, and the results of load operations can be
used in address generation faster than arithmetic operations (SR and XR). It
is faster to set an index register using LA than LHI and much faster than
using XR/SR. But LA is slower when the result was used for arithmetic
operations. There is a cost in moving values into and out of the arithmetic
unit.

But really, choosing how to zero a register makes a difference of only a
cycle or two.  The things that really matter (in descending order of
importance) are: algorithm design, cache misses, and pipeline stalls.

Reply via email to