I was trying to suggest a scheme for measuring the execution time of the 
various instructions mentioned in recent posts so that the variability of the 
pipeline operation would be removed by always having no pipeline stalls 
possible, only one cache line involved, no L2-Ln cache involved, and no 
anything else that causes variability.  If even this kind of experiment is 
impossible, then I will agree in theory and in practice with John Gilmore, 
otherwise I must disagree in theory.  But personally I still do not regard this 
minutia as important when writing code unless the code may run a million times 
per second while disabled , which has been my opinion that I have consistently 
posted for the nine years that I have been reading and posting on IBM-MAIN. 
I had a S/360 model 30 to play with in the late 1960s, there was an IBM book 
listing all the instruction timings for that model at that time, and I often 
made reference to that book while writing code.  Years later I understood the 
variability and the futility of finding the fastest way to do any given 
function, so I quit caring.  But I still use the fastest instruction timing on 
a S/360 model 30 because of my ancient habit.  E.g., today I still use SLR 
Rx,Rx to zero out general purpose register x because SLR was the fastest way on 
that model 30 in the late 1960s when I was learning how to code all the various 
instructions.  Nowadays XR, SR, or LHI may be "faster" in some pipelines than 
an SLR, but I still code an SLR unless my code is going to run a million times 
per second while disabled or there is some other important reason to count the 
picoseconds.  The time I spend thinking about another way to zero the register 
costs a gazillion times more than the saved picoseconds will buy back. 
  
I still believe that in theory there should be a way to measure accurately the 
average speed of a vast number of the same specific instruction so that a 
reliable timing can be determined, but probably only processor ALU engineers 
would be able to design an experimental way to do it accurately with all 
variables removed.  And such a timing value would only be valid for that one 
model with its particular EC level and many other details specified.  The time 
derived would be the absolute minimum time possible for the instruction.  
Having any other variable involved would only make the execution time longer. 
  
Bill Fairchild 

----- Original Message -----

From: "John Gilmore" <[email protected]> 
To: [email protected] 
Sent: Tuesday, June 3, 2014 1:31:01 PM 
Subject: Re: XR vs SR 

Bill Fairchild and I almost always agree viscerally, and we usually 
agree about the details too. 

This time we disagree.  What I tried to say in an earlier post is that 
a single global answer to the question 

What is "the fastest way to zero a register"? 

is no longer available.  In fact this and all such questions are 
delusory.  Any answer must be local and contextual.  Pipelining has 
made it so. 

John Gilmore, Ashland, MA 01721 - USA 

Reply via email to