I was trying to suggest a scheme for measuring the execution time of the various instructions mentioned in recent posts so that the variability of the pipeline operation would be removed by always having no pipeline stalls possible, only one cache line involved, no L2-Ln cache involved, and no anything else that causes variability. If even this kind of experiment is impossible, then I will agree in theory and in practice with John Gilmore, otherwise I must disagree in theory. But personally I still do not regard this minutia as important when writing code unless the code may run a million times per second while disabled , which has been my opinion that I have consistently posted for the nine years that I have been reading and posting on IBM-MAIN. I had a S/360 model 30 to play with in the late 1960s, there was an IBM book listing all the instruction timings for that model at that time, and I often made reference to that book while writing code. Years later I understood the variability and the futility of finding the fastest way to do any given function, so I quit caring. But I still use the fastest instruction timing on a S/360 model 30 because of my ancient habit. E.g., today I still use SLR Rx,Rx to zero out general purpose register x because SLR was the fastest way on that model 30 in the late 1960s when I was learning how to code all the various instructions. Nowadays XR, SR, or LHI may be "faster" in some pipelines than an SLR, but I still code an SLR unless my code is going to run a million times per second while disabled or there is some other important reason to count the picoseconds. The time I spend thinking about another way to zero the register costs a gazillion times more than the saved picoseconds will buy back. I still believe that in theory there should be a way to measure accurately the average speed of a vast number of the same specific instruction so that a reliable timing can be determined, but probably only processor ALU engineers would be able to design an experimental way to do it accurately with all variables removed. And such a timing value would only be valid for that one model with its particular EC level and many other details specified. The time derived would be the absolute minimum time possible for the instruction. Having any other variable involved would only make the execution time longer. Bill Fairchild
----- Original Message ----- From: "John Gilmore" <[email protected]> To: [email protected] Sent: Tuesday, June 3, 2014 1:31:01 PM Subject: Re: XR vs SR Bill Fairchild and I almost always agree viscerally, and we usually agree about the details too. This time we disagree. What I tried to say in an earlier post is that a single global answer to the question What is "the fastest way to zero a register"? is no longer available. In fact this and all such questions are delusory. Any answer must be local and contextual. Pipelining has made it so. John Gilmore, Ashland, MA 01721 - USA
