We probably have not seen the end of the instructions with an -N suffix.

Many of the new z/Architecture instructions reflect Power PC development
work that was keen on preventing instruction pipeline stalls.

A prior poster (Gary Weinhold Mon, 13 Dec 2010 11:29:23 -0800) sugggested
that the idea here might be to "make some microcoded instruction execute
in fewer

cycles". Fair enough interpretation, but in RISC technology design the
issue is often the scheduling of subsequent instructions.  We do not want
the subsequent instructions to wait (if not necessary) for the current
instruction to finish before we get them started in parallel.

Updating anything like a status register or PSW to reflect the results of
an instruction takes time for the current instruction (as Gary has
implicitly suggested). Yet since in some cases a subsequent instruction
might be dependent upon that PSW update result, it may have to wait. The
really esoteric issue here is that even subsequent branch instructions can
get a sneaky head start if a prior instruction that is not yet done can
still be executing while we grab and begin to evalutate PSW condition code
bits. (In other words parallelism is not just concerned with arith:arith
simultaneity, but also with arith:branch simultaneity).

We can hurry the branch instructions and the new load on condition
instructions (LOCx instructions) if we clarify that the current
instruction has no intention of updating the condition codes.

In the Power instruction mnemonics a dot-suffix designated the intent to
update the carry bit (dot present = yes update carry bit, dot absence - no
update carry bit). In z/Architecture the suffix -N means No-carryout. No-
carryout actually then means No-delay, that is, N = No-pipeline-stall.

Lots of instructions on the Power Architecture had an instruction record
bit, designated "Rc" in documentation, which in the instruction
itself could be on or off.  This bit determines if the carry-out will be
recorded in the condition register. There was a similar overflow-out
enable bit (OE), that permitted overflow to be recorded optionally in the
condition register. Some instructions on PPC came in sets of four
varieties, with a dot suffix indicating a carry-out was to be recorded,
and/or an o-suffix marking the intent to record any overflow.  So the add
instruction had

add
add.
addo
addo.


"Instructions that select the overflow option (enable XER[OV]) or that set
the XER carry bit (CA) may delay execution of subsequent instructions."

See " Power PC Microprocessor Family:
The Programming Environments for 32-Bit Microprocessors,
G522-0290-01 "

https://www-
01.ibm.com/chips/techlib/techlib.nsf/techdocs/852569B20050FF778525699600719
DF2/$file/6xx_pem.pdf



These chips had an add immediate instruction

addi

which did not update the carry-out bit of the condition register.
The document states: "The addi instruction is preferred for addition
because it sets few status bits."

This No-carryout capability is a great adventure on PPC machines, the Rc
bit (optional carryout) of an instruction opcode is available on all of
the logical operations (and, nand, or, nor, xor and eqv), and can be
engaged also on rotate and shift instructions.

A variation on this scheme can be used on some Power floating point
instructions.

The Rc field is present on the negate instruction (form two's complement),
on some multiply instructions and even divide instructions!  It is not a
minor theme there on Power chips.

Perhaps compiler writers found some of this really useful.

On the z/chips, the difference between the opcode for ALSIH v ALSIHN is
just one bit, x'CCA' v x'CCB'.

Could we see this feature proliferate within z/Architecture?  We can only
speculate at this time.

An important issue is what instruction architecture aspects from the RISC
world will really be needed on the big box to keep it very competitive.

It is the instruction pipeline in the internal RISC CPUs now onboard the
big blue wafers that are at issue here. Preventing unnecessary stalls is
utterly important in RISC.  Compilers can exploit parallelism. An
important one of all the No-stall instructions, perhaps, was the add
immediate as it is one of the two obvious ways to bump along in arrays and
data structures.  The other being an RR add.

It is not clear to this author if the No-carryout logical operations (xor,
and, etc.) are nearly so critical to pipeline optimization, much less the
multiply instructions, et al.  However, avoiding pipeline stalls is the
essence of competition in the RISC device business.  It really is the
pavement along which

RISC racers speed. Instruction parallelism is the holy grail for future
machines. So we probably will see at least some more N-suffixes, or some
kind of No-pipeline-stall instruction mark.

Putting the pipeline stall issue to the side, it is also noteworthy that
ignoring the overflow in array bumping is really not all that new.
Consider the Branch on Index High and Branch on Index Low or Equal
instructions. According to " z/Architecture Principles of Operation
Document Number: SA22-7832-02 ",

For the venerable 12 bit displacement variations, the new 20 bit
displacement variations ...
"For purposes of the addition and comparison, all operands and results are
treated as 32-bit signed binary integers for BXH and BXLE or as 64-bit
signed binary integers for BXHG and BXLEG. Overflow caused by the addition
is ignored."

And for the nifty 20 bit relative address variations ..
"For purposes of the addition and comparison, all operands and results are
treated as 32-bit signed binary integers for BRXH and BRXLE or as 64-bit
signed binary integers for BRXHG and BRXLG. Overflow caused by the
addition is ignored.

For both ...
"Condition Code: The code remains unchanged."

Don Higgins,  Sat, 11 Dec 2010 15:05:00 -0500, wrote
"I have no idea why this single add instruction with no cc update was
added.
"If I had to pick just one, it seems like low order 32 bits or all 64 bit
add would have been more useful for adjusting indexes etc."


The 32 bit variations of BXH and BXLE, are a way to do this add-with-No-
carryout on the low order half of the general purpose registers.  In this
case you would have to code the branch to go nowhere else then to the next
instrtuction, willy-nilly.  Never seen it done but would achieve the
result of No-carryout and, I think, also not necessarily stall subsequent
instructions. (But here I am mixing the subject of the effect on
subsequent instructions from an instruction that updates the CC, and the
effect on subsequent instructions from commencing a branch instruction
which branch instruction does not interrogate the CC).

The 64 bit variations of BXHG and BXLEG, are a way to do this add-with-No-
carryout on the full register.

The BXH and BXLE instructions treat both operands as signed integers, and
in that respect are different from ALSIH/ALSIHN which treats the immediate
operand as signed, but the other operands as unsigned (logical).

Reply via email to