For anyone interested in all the gory details of the 20-bit MSP420x
extension, the 2006 edition of the MSP430x4xx family user's guide
(slau056f.pdf) is now available at ti.com.
(http://focus.ti.com/lit/ug/slau056f/slau056f.pdf)

It now has chapter 4, "The 16-bit MSP430X CPU" explaining the new stuff.

Basically, all registers (except for R2) are now 20 bits wide, but 16-bit
operations to them write zeros to the high-order bits.

There are a number of new instructions, and a prefix word that can be
added to any previous instruction to widen the 16-bit offsets to 20 bits
and optionally select a 20-bit operand size.

The addition of a prefix word is denoted by adding an "X" to the mnemonic,
and such forms can take a ".A" (20-bit address) operand size suffix.
.A memory operands are stored as 32 bits (the top 12 bits always written
as zero), in little-endian order.  @Rn+ increments Rn by 4.

It wouldn't have taken any more memory space to just widen all the
registers (except for R0 and R2) to 32 bits, but it would have cost a
bunch of silicon to implement.

The C, V and N flags are set based on the msbit of the result, be that
bit 7 (.b), bit 15 (.W) or bit 19 (.A).

Non-X opcodes using indexed addressing have one trick: For compatibility,
if the base register is <64K, the addition of the 16-bit offset wraps
at 64K.  But if the base register is >= 64K, the 16-bit offset is
sign-extended and the sum doesn't wrap until 1M.  None of this applies
to 20-bit offsets.

The prefix word also has a "force carry-in to 0" bit, which can be used
with DADD and RRC.  RRC has two X forms:
RRCX    -> carry is normal
RRUX    -> "rotate right unsigned" carry is forced to 0

I don't see an obvious way to force the carry-in clear with DADDX.


Interrupts still take 2 words on the stack, as the high-order 4 bits
of the PC are tucked into the unused high-order status register bits on
the stack.

The opcode map is now:
0_0     MOVA @Rsrc, Rdst
0_1     MOVA @Rsrc+, Rdst
0_2     MOVA imm20(PC), Rdst
0_3     MOVA imm16(Rsrc), Rdst
0_4     RxxM.A #imm2,Rdst
0_5     RxxM.W #imm2,Rdst
0_6     MOVA Rsrc, imm20(PC)
0_7     MOVA Rsrc, imm16(Rdst)
0_8     MOVA, AMPA, ADDA, SUBA #imm20,Rdst
0_c     MOVA, AMPA, ADDA, SUBA Rsrc,Rdst
100     RRC     \
108     SWPB     \
110     RRA       \
118     SXT        > Pre-existing
120     PUSH      /
128     CALL     /
130     RETI    /
134     CALLA
138     CALLA
13c     (res)
14      PUSHM.A #imm4,Rsrc
15      PUSHM.W #imm4,Rsrc
16      POPM.A #imm4,Rdst
17      POPM.W #imm4,Rdst
18      X prefix
20      JZ      \
24      JNZ      \
28      JNC       \
2C      JC         \
30      JN          \
34      JGE          \
38      JL            \
3C      JMP            \
4       MOV             \
5       ADD              \ Pre-existing
6       ADDC             /
7       SUBC            /
8       SUB            /
9       CMP           /
A       DADD         /
B       BIT         /
C       BIC        /
D       BIS       /
E       XOR      /
F       AND     /

The high-nybble-0 instructions are mostly the "address" instructions,
which are shorter & faster aliases of equivalent MOVX.A, CMPX.A, ADDX.A
and SUBX.A instructions.

MOVA (only) allows:
1 1 1 1 1 1 1
6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0|  Rsrc |0 0 0 0|  Rdst | MOVA @Rsrc, Rdst
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0|  Rsrc |0 0 0 1|  Rdst | MOVA @Rsrc+, Rdst
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0| off_hi|0 0 1 0|  Rdst | MOVA off20(PC), Rdst (= MOVA LABEL, Rdst)
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|           off15:0             |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0|n-1|opc|0 1 0 w|  Rdst | Bit-shifts (see below)
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0|  Rsrc |0 1 1 0| off_hi| MOVA Rsrc, off20(PC) (= MOVA Rsrc, LABEL)
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|           off15:0             |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0|  Rsrc |0 c 1 1|  Rdst | MOVA simm16(Rsrc),Rdst (c=0) and
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ MOVA Rsrc,simm16(Rdst) (c=1)
|          index15:0            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

MOVA, CMPA, ADDA, SUBA all allow:

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0| imm_hi|1 0|opc|  Rdst | opA #imm20, Rdst
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|           imm15:0             |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0|  Rsrc |1 1|opc|  Rdst | opA Rsrc, Rdst
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Opcodes are:
00      MOVA src,Rdst
01      CMPA src,Rdst
10      ADDA src,Rdst
11      SUBA src,Rdst


Tucked in to the RETI.B space is CALLA.  CALLA is like CALL, but pushes
32 bits of return address.  "RETA" is an alias for "MOVA @SP+,PC",
although you could use "MOVX.A @SP+,PC" as well, if you wanted.

1 1 1 1 1 1 1
6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 1 0 0 1 1|0 0            | RETI
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 1 0 0 1 1|0 1 0 0|  Rsrc | CALLA Rsrc
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 1 0 0 1 1|0 1 0 1|  Rsrc | CALLA simm16(Rsrc)
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|           imm15:0             |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 1 0 0 1 1|0 1 1 0|  Rsrc | CALLA @Rsrc
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 1 0 0 1 1|0 1 1 1|  Rsrc | CALLA @Rsrc+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 1 0 0 1 1|1 0 0 0| abs_hi| CALLA &abs20
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|           abs15:0             |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 1 0 0 1 1|1 0 0 1| off_hi| CALLA off20(PC)
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|           off15:0             |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 1 0 0 1 1|1 0 1 0|       | Reserved
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 1 0 0 1 1|1 0 1 1| imm_hi| +imm_lo CALLA #imm20
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 1 0 0 1 1|1 1    |       | Reserved
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


There are some new multi-bit register shift instructions.
They take 1 cycle per bit, but allow a count of 1..4.
They only operate on .W or .A registers:

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0|n-1|opc|0 1 0|w|  Rdst | RxxM.A #n,Rdst
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The immediate n (1..4) is encoded in the n-1 field.
The w bit selects .A if 0 or .W if 1.
The opcodes are:
00      RRCM.w #n,Rdst  Rotate right through carry
01      RRAM.w #n,Rdst  Rotate (shift) right arithmetic
10      RLAM.w #n,Rdst  Rotate (shift) left
11      RRUM.w #n,Rdst  Rotate (shift) right unsigned

(Higher repeat counts are available via RRCX, with
a prefix word.)

PUSHM and POPM support 16- or 20-bit registers.
Rdst specifies the HIGHEST register to work with.
The n-1 field specifies the number of registers.
The lowest-numbered register is stored in the lowest address,
i.e. the highest-numbered register is pushed first.
1 1 1 1 1 1 1
6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 1 0 1 0|w|  n-1  |  Rsrc | PUSHM.w #n,Rsrc
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 1 0 1 1|w|  n-1  |  Rdst | POPM.w #n,Rdst
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

E.g. "PUSHM.w #5,R15" pushes R15..R11.
"POPM.w #5,R15" pops R11..R15.


Finally, the X prefix.  This has two forms, depending on whether the
following instruction's operands are all registers or not.  I'll
take the "not" case (at least one mempry operand) first:

Non-register prefix instruction:
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| 0   0   0   1   1 |  source19:16  |A/L| 0   0 |   dest19:16   |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
The 4-bit fields extend the associated immediate constants.
A/L combines with the following B/W bit to select the operand size:
A/L     B/W     operand size
0       0       reserved (.L = 32 bits in future, maybe?)
0       1       .A = 20-bit
1       0       .W = 16-bit
1       1       .B = 8-bit

EXCEPTION: SWPBX.A and SXTX.A are encoded with B/W=0.

Also note that SXT.W and SXTX.W instructions extend the
sign bit into all 20 bits; they are the ONLY .W instructions
that do not clear bits 16..19 of the destination.

In case you're wondering, SWPBX.A swaps the low-order bytes and
leaves bits 19..16 unchanged (but clearing bits 31..20 on
a memory operand).  No, I'm not sure why it's useful either.


Now, when all operands are registers, the prefix word has some
more "interesting" options:
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| 0   0   0   1   1 | 0   0 | ZC| # |A/L| 0   0 |  (n-1) or Rn  |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
ZC = Zero carry.  Force ALU carry input to 0.  Carry out is normal.  (For DADD 
& RRC)
# = Repeat count in register (if 1) or immediate (if 0).
A/L = Operand size, as above.

The ZC bit is currently only accessible via the RRUX mnemomic,
although I'd think it could be useful to DADD as well.

The magic feature here is the repeat count.  The repeat count
is 1..16 (first execution plus 0..15 more).  It can come from
either a 4-bit immediate field, or the low 4 bits of a register.
(No, I don't know what happens if you specify R2 or R3.)

The default is, of course, a repeat count of #1, which is all
zeros in the relevant bits.


In the assembler, a repeat count is expressed with a "REP #n" or
"REP Rn" instruction on the line before the instruction to be
repeated.  It must, of course, be an X instruction.
(I don't know what hapens if you try to code "ADDAX Rsrc,Rdst"
or some such silliness.  I can't see a case where there isn't
an equally-good documented alternative.)

Repeats add 1 cycle each (past the first) to the instruction
execution time.   Well, actually, it says 1 cycle per total
repeat, but I'm not sure which would apply in the event of
a repeated add to PC (which normally takes 2 cycles).


Execution time:
They shaved cycle off a few places, bringing the processor closer to
the ideal of one cycle per memory word accessed.  (Remember that .A
memory operands count as two words.)

Improvements over the MSP430 are:
- MOV, BIT, and CMP with memory destinations no longer take a
  dummy second destination access cycle; they are now one
  (or two, for .A) cycle faster than ADD, SUB, etc.
- RETI is now 3 cycles (not 5), as good as possible.
- Taking an interrupt is now 5 cycles, not 6.
- PUSH @Rn takes 3 cycles, not 4.
- PUSH @Rn+ (including PUSH #imm) takes 3 cycles, not 5.
- CALL Rn takes 3 cycles, not 4.
- CALL @Rn+, (including CALL #imm) takes 4 cycles, not 5.
- PUSH and CALL off16(Rn) (including off16(PC) and absolute mode) take 4
  cycles, not 5.  Exception: off16(SP) or still takes 5 cycles.

Places it got worse:
- MOV @Rn,PC takes 3 cycles, not 2

Remaining exceptions to the one-word-per-cycle rule are:

- Branches still take 2 cycles (not 1), taken or not.
- PC (R0) as a destination adds an extra cycle to:
  - 2-operand and address instructions with a base time of 1 or 2 cycles.
    (op Rsrc, PC takes 2 cycles; MOV #imm,PC takes 3), and
  - *all* 2-operand X instructions other than MOV, BIT and CMP
  (If both apply, it's still just a 1-cycle penalty.)
- PUSH Rn takes 3 cycles (not 2)
- PUSH and CALL with an off16(SP) operand take 5 cycles, (not 4)
- PUSHX and CALLA with an offset(SP) operand take an extra cycle.

- RxxM multi-bit shifts take 1 cycle per bit shifted.
- Repeated instructions take 1 cycle per repetition past 1.


Things that can't be done:

- CALLA off20(Rsrc).  Can an X prefix be added to CALLA for this purpose?

Other notes:
- I don't see any discussion of the PUSH #4 and PUSH #8 errata.

Reply via email to