Pipelining a machine and adding caches does throw a monkey wrench into
the discussion. Add interrupts and you really have a mess. That is one
reason why the performance guys like to preface every sentence with
"YMMV" or "It depends" :-)  

-----Original Message-----
From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On
Behalf Of Ray Mullins
Sent: Monday, December 04, 2006 12:40 PM
To: [email protected]
Subject: Re: CMSCALL return code

There is a new option now, especially with non-zero codes:

LHI  R15,4

No storage fetch.

The subject of instruction timings on IBM-MAIN and ASSEMBLER-LIST comes
up now and then.  I point y'all to the archives of both lists.  With the
new z/Architecture pipelines and caches, sometimes what seems at first
to be illogical instruction placement may actually be better.
Hypothetical illustration example:


L    R4,RECPTR    Load address of pointer
AHI  R6,1         Add 1 to counter
AHI  R8,(-8)      Some other strange counter
CLI  16(R4),X'40'
JE   GOHERE

The z/Architecture processor will execute the two AHI instructions while
the base/displacement calculation and storage access for the L
instruction is occurring, because it knows that R4 isn't affected by
those instructions.
By the time the CLI is hit R4 will contain the address and there is no
delay that might occur if you code

AHI  R6,1         Add 1 to counter
AHI  R8,(-8)      Some other strange counter
L    R4,RECPTR    Load address of pointer
CLI  16(R4),X'40'
JE   GOHERE

In this case, there might be a delay at the CLI.

Speaking of branches there's been an interesting discussion recently
about the branch-prediction logic in z/Architecture, which is why I
demonstrate with the R&I (or is it I&R? I can never remember)
instruction.

Later,
Ray

-----Original Message-----
From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On
Behalf Of Mike Walter
Sent: Monday December 04 2006 12:02
To: [email protected]
Subject: Re: CMSCALL return code

Sheesh, this goes way back to my good old Assembler diaper days when
programmers really cared about performance instead of drag and drop
solutions.
Slightly off-topic: if I remember correctly, we argued intensely about
zeroing a GPR and the performance differences between: 

- SR R15,R15
- XR R15,R15
- LA R15,0    (not seriously considered by performance geeks)
- L R15,=F'0' (considered for use only by amateur programmers coming
from a BASIC or COBOL background and otherwise held in low esteem by
"real programmers").  ;-)

IIRC, the actual performance difference between SR and XR was different
based more on specific processor models that anything else.

Mike Walter
Hewitt Associates
Any opinions expressed herein are mine alone and do not necessarily
represent the opinions or policies of Hewitt Associates.




"Schuh, Richard" <[EMAIL PROTECTED]> 

Sent by: "The IBM z/VM Operating System" <[email protected]>
12/04/2006 11:37 AM
Please respond to
"The IBM z/VM Operating System" <[email protected]>



To
[email protected]
cc

Subject
Re: CMSCALL return code






True, and it is undoubtedly faster to use SR  R15,R15 than it is to use
LA R15,0 to zero the register - there are no storage fetches and real
subtraction is not needed if the result can be predicted, as it can in
this case. However, the discussion had more to do with fetches of
boundary-aligned vs. non-aligned data. There was no mention of the
optimum speed for getting either a specific or an arbitrary value loaded
into a register. In this day of pipelined machines
 
This is sort of reminiscent of the good old days, programming in 7080
Autocoder. Boeing insisted that the programmers use a MOVE macro because
there were 26 different ways to move data from one storage location to
another. It was expected that most programmers would use either their
favorite way or the first one that popped into their heads if left on
their own. The macro chose the optimal way, depending on the operand
definitions.

From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On
Behalf Of Stanley Rarick
Sent: Friday, December 01, 2006 10:37 PM
To: [email protected]
Subject: Re: CMSCALL return code

For a return code, LA R15,value is *much* faster than a L - only one
storage fetch.

Schuh, Richard wrote:
I really would not have left it to chance, I would have defined a
word-aligned constant rather than using a literal. However, it might not
have been as chancy as it may seem. The literal pool is doubleword
aligned and boundary alignment may have been a factor in determining
where the literal resided. I would like to think that the 8-byte
multiples are put at the front, the 4-byters next, then the twos
followed by everybody else. In looking at an assembly listing, that
seems to be the sequence. The first two literals in the program are
=x'0000A00', the next =x'FF', etc. In the literal pool, all 4 byte
entries (there were no 8 byte literals) precede the two byte literals
and then come the ones of only 1 byte. Within each of these groups, the
literals appear in the order in which they were defined.
There were no long strings defined as literals in the particular
listing. 

-----Original Message-----
From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On
Behalf Of Don Russell
Sent: Tuesday, November 21, 2006 3:46 PM
To: [email protected]
Subject: Re: CMSCALL return code

Schuh, Richard wrote:
 
I agree, it does seem non-intuitive. The initial SR   R15,R15 was
undoubtedly preparing for a default rc of zero. How the non-zero rc gets
put into the register later is largely a matter of taste. In this
 
case I
 
probably would have chosen L   R15,=X'...' - a habit learned, when
machines were slower, based on the knowledge that they were mostly
optimized for the LOAD instruction vs. any other way of putting data
from memory into a register.
 
 

If your habit was to use L Rx,=X'...' you were probably lucky in the old
days.... the =X literal would not necessarily be word-aligned, causing
two fetches to load the register, or, in the days when alignment really
mattered... a program exception.

Better to use L R15,=A(X'...') if alignment is a concern and you want to
use literals.

Then the literal IS aligned on a fullword boundary.

The initial SR 15,15 is unlikely to be setting the default return code..
.it's clearing the register preparing for the different option bytes to
be OR'd in. I agree the macro could (should?) have generated a single L
instruction instead, but then what nits would we have to discuss? :-)

 


 
The information contained in this e-mail and any accompanying documents
may contain information that is confidential or otherwise protected from
disclosure. If you are not the intended recipient of this message, or if
this message has been addressed to you in error, please immediately
alert the sender by reply e-mail and then delete this message, including
any attachments. Any dissemination, distribution or other use of the
contents of this message by anyone other than the intended recipient is
strictly prohibited.

Reply via email to