We probably have not seen the end of the instructions with an -N suffix. Many of the new z/Architecture instructions reflect Power PC development work that was keen on preventing instruction pipeline stalls.
A prior poster (Gary Weinhold Mon, 13 Dec 2010 11:29:23 -0800) sugggested that the idea here might be to "make some microcoded instruction execute in fewer cycles". Fair enough interpretation, but in RISC technology design the issue is often the scheduling of subsequent instructions. We do not want the subsequent instructions to wait (if not necessary) for the current instruction to finish before we get them started in parallel. Updating anything like a status register or PSW to reflect the results of an instruction takes time for the current instruction (as Gary has implicitly suggested). Yet since in some cases a subsequent instruction might be dependent upon that PSW update result, it may have to wait. The really esoteric issue here is that even subsequent branch instructions can get a sneaky head start if a prior instruction that is not yet done can still be executing while we grab and begin to evalutate PSW condition code bits. (In other words parallelism is not just concerned with arith:arith simultaneity, but also with arith:branch simultaneity). We can hurry the branch instructions and the new load on condition instructions (LOCx instructions) if we clarify that the current instruction has no intention of updating the condition codes. In the Power instruction mnemonics a dot-suffix designated the intent to update the carry bit (dot present = yes update carry bit, dot absence - no update carry bit). In z/Architecture the suffix -N means No-carryout. No- carryout actually then means No-delay, that is, N = No-pipeline-stall. Lots of instructions on the Power Architecture had an instruction record bit, designated "Rc" in documentation, which in the instruction itself could be on or off. This bit determines if the carry-out will be recorded in the condition register. There was a similar overflow-out enable bit (OE), that permitted overflow to be recorded optionally in the condition register. Some instructions on PPC came in sets of four varieties, with a dot suffix indicating a carry-out was to be recorded, and/or an o-suffix marking the intent to record any overflow. So the add instruction had add add. addo addo. "Instructions that select the overflow option (enable XER[OV]) or that set the XER carry bit (CA) may delay execution of subsequent instructions." See " Power PC Microprocessor Family: The Programming Environments for 32-Bit Microprocessors, G522-0290-01 " https://www- 01.ibm.com/chips/techlib/techlib.nsf/techdocs/852569B20050FF778525699600719 DF2/$file/6xx_pem.pdf These chips had an add immediate instruction addi which did not update the carry-out bit of the condition register. The document states: "The addi instruction is preferred for addition because it sets few status bits." This No-carryout capability is a great adventure on PPC machines, the Rc bit (optional carryout) of an instruction opcode is available on all of the logical operations (and, nand, or, nor, xor and eqv), and can be engaged also on rotate and shift instructions. A variation on this scheme can be used on some Power floating point instructions. The Rc field is present on the negate instruction (form two's complement), on some multiply instructions and even divide instructions! It is not a minor theme there on Power chips. Perhaps compiler writers found some of this really useful. On the z/chips, the difference between the opcode for ALSIH v ALSIHN is just one bit, x'CCA' v x'CCB'. Could we see this feature proliferate within z/Architecture? We can only speculate at this time. An important issue is what instruction architecture aspects from the RISC world will really be needed on the big box to keep it very competitive. It is the instruction pipeline in the internal RISC CPUs now onboard the big blue wafers that are at issue here. Preventing unnecessary stalls is utterly important in RISC. Compilers can exploit parallelism. An important one of all the No-stall instructions, perhaps, was the add immediate as it is one of the two obvious ways to bump along in arrays and data structures. The other being an RR add. It is not clear to this author if the No-carryout logical operations (xor, and, etc.) are nearly so critical to pipeline optimization, much less the multiply instructions, et al. However, avoiding pipeline stalls is the essence of competition in the RISC device business. It really is the pavement along which RISC racers speed. Instruction parallelism is the holy grail for future machines. So we probably will see at least some more N-suffixes, or some kind of No-pipeline-stall instruction mark. Putting the pipeline stall issue to the side, it is also noteworthy that ignoring the overflow in array bumping is really not all that new. Consider the Branch on Index High and Branch on Index Low or Equal instructions. According to " z/Architecture Principles of Operation Document Number: SA22-7832-02 ", For the venerable 12 bit displacement variations, the new 20 bit displacement variations ... "For purposes of the addition and comparison, all operands and results are treated as 32-bit signed binary integers for BXH and BXLE or as 64-bit signed binary integers for BXHG and BXLEG. Overflow caused by the addition is ignored." And for the nifty 20 bit relative address variations .. "For purposes of the addition and comparison, all operands and results are treated as 32-bit signed binary integers for BRXH and BRXLE or as 64-bit signed binary integers for BRXHG and BRXLG. Overflow caused by the addition is ignored. For both ... "Condition Code: The code remains unchanged." Don Higgins, Sat, 11 Dec 2010 15:05:00 -0500, wrote "I have no idea why this single add instruction with no cc update was added. "If I had to pick just one, it seems like low order 32 bits or all 64 bit add would have been more useful for adjusting indexes etc." The 32 bit variations of BXH and BXLE, are a way to do this add-with-No- carryout on the low order half of the general purpose registers. In this case you would have to code the branch to go nowhere else then to the next instrtuction, willy-nilly. Never seen it done but would achieve the result of No-carryout and, I think, also not necessarily stall subsequent instructions. (But here I am mixing the subject of the effect on subsequent instructions from an instruction that updates the CC, and the effect on subsequent instructions from commencing a branch instruction which branch instruction does not interrogate the CC). The 64 bit variations of BXHG and BXLEG, are a way to do this add-with-No- carryout on the full register. The BXH and BXLE instructions treat both operands as signed integers, and in that respect are different from ALSIH/ALSIHN which treats the immediate operand as signed, but the other operands as unsigned (logical).
