For anyone interested in all the gory details of the 20-bit MSP420x extension, the 2006 edition of the MSP430x4xx family user's guide (slau056f.pdf) is now available at ti.com. (http://focus.ti.com/lit/ug/slau056f/slau056f.pdf)
It now has chapter 4, "The 16-bit MSP430X CPU" explaining the new stuff. Basically, all registers (except for R2) are now 20 bits wide, but 16-bit operations to them write zeros to the high-order bits. There are a number of new instructions, and a prefix word that can be added to any previous instruction to widen the 16-bit offsets to 20 bits and optionally select a 20-bit operand size. The addition of a prefix word is denoted by adding an "X" to the mnemonic, and such forms can take a ".A" (20-bit address) operand size suffix. .A memory operands are stored as 32 bits (the top 12 bits always written as zero), in little-endian order. @Rn+ increments Rn by 4. It wouldn't have taken any more memory space to just widen all the registers (except for R0 and R2) to 32 bits, but it would have cost a bunch of silicon to implement. The C, V and N flags are set based on the msbit of the result, be that bit 7 (.b), bit 15 (.W) or bit 19 (.A). Non-X opcodes using indexed addressing have one trick: For compatibility, if the base register is <64K, the addition of the 16-bit offset wraps at 64K. But if the base register is >= 64K, the 16-bit offset is sign-extended and the sum doesn't wrap until 1M. None of this applies to 20-bit offsets. The prefix word also has a "force carry-in to 0" bit, which can be used with DADD and RRC. RRC has two X forms: RRCX -> carry is normal RRUX -> "rotate right unsigned" carry is forced to 0 I don't see an obvious way to force the carry-in clear with DADDX. Interrupts still take 2 words on the stack, as the high-order 4 bits of the PC are tucked into the unused high-order status register bits on the stack. The opcode map is now: 0_0 MOVA @Rsrc, Rdst 0_1 MOVA @Rsrc+, Rdst 0_2 MOVA imm20(PC), Rdst 0_3 MOVA imm16(Rsrc), Rdst 0_4 RxxM.A #imm2,Rdst 0_5 RxxM.W #imm2,Rdst 0_6 MOVA Rsrc, imm20(PC) 0_7 MOVA Rsrc, imm16(Rdst) 0_8 MOVA, AMPA, ADDA, SUBA #imm20,Rdst 0_c MOVA, AMPA, ADDA, SUBA Rsrc,Rdst 100 RRC \ 108 SWPB \ 110 RRA \ 118 SXT > Pre-existing 120 PUSH / 128 CALL / 130 RETI / 134 CALLA 138 CALLA 13c (res) 14 PUSHM.A #imm4,Rsrc 15 PUSHM.W #imm4,Rsrc 16 POPM.A #imm4,Rdst 17 POPM.W #imm4,Rdst 18 X prefix 20 JZ \ 24 JNZ \ 28 JNC \ 2C JC \ 30 JN \ 34 JGE \ 38 JL \ 3C JMP \ 4 MOV \ 5 ADD \ Pre-existing 6 ADDC / 7 SUBC / 8 SUB / 9 CMP / A DADD / B BIT / C BIC / D BIS / E XOR / F AND / The high-nybble-0 instructions are mostly the "address" instructions, which are shorter & faster aliases of equivalent MOVX.A, CMPX.A, ADDX.A and SUBX.A instructions. MOVA (only) allows: 1 1 1 1 1 1 1 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 0 0 0| Rsrc |0 0 0 0| Rdst | MOVA @Rsrc, Rdst +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 0 0 0| Rsrc |0 0 0 1| Rdst | MOVA @Rsrc+, Rdst +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 0 0 0| off_hi|0 0 1 0| Rdst | MOVA off20(PC), Rdst (= MOVA LABEL, Rdst) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | off15:0 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 0 0 0|n-1|opc|0 1 0 w| Rdst | Bit-shifts (see below) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 0 0 0| Rsrc |0 1 1 0| off_hi| MOVA Rsrc, off20(PC) (= MOVA Rsrc, LABEL) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | off15:0 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 0 0 0| Rsrc |0 c 1 1| Rdst | MOVA simm16(Rsrc),Rdst (c=0) and +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ MOVA Rsrc,simm16(Rdst) (c=1) | index15:0 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ MOVA, CMPA, ADDA, SUBA all allow: +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 0 0 0| imm_hi|1 0|opc| Rdst | opA #imm20, Rdst +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | imm15:0 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 0 0 0| Rsrc |1 1|opc| Rdst | opA Rsrc, Rdst +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Opcodes are: 00 MOVA src,Rdst 01 CMPA src,Rdst 10 ADDA src,Rdst 11 SUBA src,Rdst Tucked in to the RETI.B space is CALLA. CALLA is like CALL, but pushes 32 bits of return address. "RETA" is an alias for "MOVA @SP+,PC", although you could use "MOVX.A @SP+,PC" as well, if you wanted. 1 1 1 1 1 1 1 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 0 0 1 0 0 1 1|0 0 | RETI +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 0 0 1 0 0 1 1|0 1 0 0| Rsrc | CALLA Rsrc +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 0 0 1 0 0 1 1|0 1 0 1| Rsrc | CALLA simm16(Rsrc) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | imm15:0 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 0 0 1 0 0 1 1|0 1 1 0| Rsrc | CALLA @Rsrc +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 0 0 1 0 0 1 1|0 1 1 1| Rsrc | CALLA @Rsrc+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 0 0 1 0 0 1 1|1 0 0 0| abs_hi| CALLA &abs20 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | abs15:0 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 0 0 1 0 0 1 1|1 0 0 1| off_hi| CALLA off20(PC) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | off15:0 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 0 0 1 0 0 1 1|1 0 1 0| | Reserved +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 0 0 1 0 0 1 1|1 0 1 1| imm_hi| +imm_lo CALLA #imm20 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 0 0 1 0 0 1 1|1 1 | | Reserved +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ There are some new multi-bit register shift instructions. They take 1 cycle per bit, but allow a count of 1..4. They only operate on .W or .A registers: +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 0 0 0|n-1|opc|0 1 0|w| Rdst | RxxM.A #n,Rdst +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The immediate n (1..4) is encoded in the n-1 field. The w bit selects .A if 0 or .W if 1. The opcodes are: 00 RRCM.w #n,Rdst Rotate right through carry 01 RRAM.w #n,Rdst Rotate (shift) right arithmetic 10 RLAM.w #n,Rdst Rotate (shift) left 11 RRUM.w #n,Rdst Rotate (shift) right unsigned (Higher repeat counts are available via RRCX, with a prefix word.) PUSHM and POPM support 16- or 20-bit registers. Rdst specifies the HIGHEST register to work with. The n-1 field specifies the number of registers. The lowest-numbered register is stored in the lowest address, i.e. the highest-numbered register is pushed first. 1 1 1 1 1 1 1 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 0 0 1 0 1 0|w| n-1 | Rsrc | PUSHM.w #n,Rsrc +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 0 0 1 0 1 1|w| n-1 | Rdst | POPM.w #n,Rdst +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ E.g. "PUSHM.w #5,R15" pushes R15..R11. "POPM.w #5,R15" pops R11..R15. Finally, the X prefix. This has two forms, depending on whether the following instruction's operands are all registers or not. I'll take the "not" case (at least one mempry operand) first: Non-register prefix instruction: +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | 0 0 0 1 1 | source19:16 |A/L| 0 0 | dest19:16 | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ The 4-bit fields extend the associated immediate constants. A/L combines with the following B/W bit to select the operand size: A/L B/W operand size 0 0 reserved (.L = 32 bits in future, maybe?) 0 1 .A = 20-bit 1 0 .W = 16-bit 1 1 .B = 8-bit EXCEPTION: SWPBX.A and SXTX.A are encoded with B/W=0. Also note that SXT.W and SXTX.W instructions extend the sign bit into all 20 bits; they are the ONLY .W instructions that do not clear bits 16..19 of the destination. In case you're wondering, SWPBX.A swaps the low-order bytes and leaves bits 19..16 unchanged (but clearing bits 31..20 on a memory operand). No, I'm not sure why it's useful either. Now, when all operands are registers, the prefix word has some more "interesting" options: +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | 0 0 0 1 1 | 0 0 | ZC| # |A/L| 0 0 | (n-1) or Rn | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ ZC = Zero carry. Force ALU carry input to 0. Carry out is normal. (For DADD & RRC) # = Repeat count in register (if 1) or immediate (if 0). A/L = Operand size, as above. The ZC bit is currently only accessible via the RRUX mnemomic, although I'd think it could be useful to DADD as well. The magic feature here is the repeat count. The repeat count is 1..16 (first execution plus 0..15 more). It can come from either a 4-bit immediate field, or the low 4 bits of a register. (No, I don't know what happens if you specify R2 or R3.) The default is, of course, a repeat count of #1, which is all zeros in the relevant bits. In the assembler, a repeat count is expressed with a "REP #n" or "REP Rn" instruction on the line before the instruction to be repeated. It must, of course, be an X instruction. (I don't know what hapens if you try to code "ADDAX Rsrc,Rdst" or some such silliness. I can't see a case where there isn't an equally-good documented alternative.) Repeats add 1 cycle each (past the first) to the instruction execution time. Well, actually, it says 1 cycle per total repeat, but I'm not sure which would apply in the event of a repeated add to PC (which normally takes 2 cycles). Execution time: They shaved cycle off a few places, bringing the processor closer to the ideal of one cycle per memory word accessed. (Remember that .A memory operands count as two words.) Improvements over the MSP430 are: - MOV, BIT, and CMP with memory destinations no longer take a dummy second destination access cycle; they are now one (or two, for .A) cycle faster than ADD, SUB, etc. - RETI is now 3 cycles (not 5), as good as possible. - Taking an interrupt is now 5 cycles, not 6. - PUSH @Rn takes 3 cycles, not 4. - PUSH @Rn+ (including PUSH #imm) takes 3 cycles, not 5. - CALL Rn takes 3 cycles, not 4. - CALL @Rn+, (including CALL #imm) takes 4 cycles, not 5. - PUSH and CALL off16(Rn) (including off16(PC) and absolute mode) take 4 cycles, not 5. Exception: off16(SP) or still takes 5 cycles. Places it got worse: - MOV @Rn,PC takes 3 cycles, not 2 Remaining exceptions to the one-word-per-cycle rule are: - Branches still take 2 cycles (not 1), taken or not. - PC (R0) as a destination adds an extra cycle to: - 2-operand and address instructions with a base time of 1 or 2 cycles. (op Rsrc, PC takes 2 cycles; MOV #imm,PC takes 3), and - *all* 2-operand X instructions other than MOV, BIT and CMP (If both apply, it's still just a 1-cycle penalty.) - PUSH Rn takes 3 cycles (not 2) - PUSH and CALL with an off16(SP) operand take 5 cycles, (not 4) - PUSHX and CALLA with an offset(SP) operand take an extra cycle. - RxxM multi-bit shifts take 1 cycle per bit shifted. - Repeated instructions take 1 cycle per repetition past 1. Things that can't be done: - CALLA off20(Rsrc). Can an X prefix be added to CALLA for this purpose? Other notes: - I don't see any discussion of the PUSH #4 and PUSH #8 errata.