Amen. (And for once slightly interesting). Maybe post this to IBM-MAIN too?
-----Ursprüngliche Nachricht----- Von: IBM Mainframe Assembler List [mailto:[email protected]] Im Auftrag von David Bond Gesendet: Mittwoch, 4. Juni 2014 09:02 An: [email protected] Betreff: Re: XR vs SR Anyone who thinks that the S/360 instruction timings have any relevance to how machines work today has no understanding of the last several decades of processor design. Yes, simple instructions generally execute faster than more complex instructions. But even that rule of thumb is overshadowed by pipeline stalls caused by register dependencies, Address Generation Interlock, address translation, cache effects, branch prediction and other things. In the specific case or XR vs SR, both are the same speed on probably any machine since 1975. But neither can be faster than LHI because XR and SR set the condition code and LHI does not. Setting the condition code is a separate suboperation. The difference in the length of XR and SR vs LHI does not make up for the fact that XR and SR are more complex than LHI. Furthermore if i-cache has any measurable effect, then alignment or misalignment of blocks of instructions to i-cache boundaries will almost always have a bigger effect than individual instruction length. And while LA can be used in many cases where LHI can and are the same length, at least on some recent machines there was a difference in speed that depended on how the register was subsequently used. On those machines, the results of LA can be fed into address generation faster than the results of a load operation such as LHI, and the results of load operations can be used in address generation faster than arithmetic operations (SR and XR). It is faster to set an index register using LA than LHI and much faster than using XR/SR. But LA is slower when the result was used for arithmetic operations. There is a cost in moving values into and out of the arithmetic unit. But really, choosing how to zero a register makes a difference of only a cycle or two. The things that really matter (in descending order of importance) are: algorithm design, cache misses, and pipeline stalls.
