# Register VM Design
Current Relay VM RFC (#2810) proposes stack-based VM design which uses push and
pop to maintain a unified stack. Though the design itself is simple to
implement, it is cumbersome in tasks such as dataflow analysis and enforces
certain orders in the execution.
I propose a register-based VM design as an alternative design. The main
difference is that register-based VM uses registers to designate the operands
and results instead of using the stack. Registers in the VM are virtual
registers and each is a reference to a `VMObject`. We assume there are infinite
registers so runtime doesn't need to worry about register spilling. Registers
are in SSA form where its index is unique in each VMFunction.
Calling convention in register VM is also simple. Each function has its own
local register file. The first `k` registers in the register file of callee
function are arguments passed in by caller function, and the VMObject in return
register is allocated by callee function instead of caller function to avoid
unnecessary memory copy. Caller function then assigns the reference to VMObject
pointed by return register into its own register (see detail in `Invoke`
instruction).
I summarize some pros and cons for register VM compared to stack VM.
Pros:
- Data dependency analysis is self-explaining as register reveals the data
dependency. In comparison, stack VM requires you to simulate the program and
recover the stack index in order to get data dependency. As a result, it'll be
easier to implement a dataflow executor or out-of-order executor.
- If/else scope is easier to handle. Previously stack VM needs to move the
objects to the right place in the stack given it enters if or else branch. Now
register VM only needs a new `Phi` instruction to select the result from either
branch.
- Tail recursion optimization should be easier since we only need to move the
register to the correct slot in the register file.
Cons:
- There is some memory overhead to keep the empty slots for registers that are
never used.
- Need to kill the register after its life cycle. This can be done by either
annotating the life cycle of each register or inserting the kill instructions
in the program during the life cycle analysis. Note that life cycle analysis
pass is needed by both stack VM and register VM.
## Instructions
We modify the current stack VM instructions to use registers as operands and
results.
Registers are designated by `$k` where `k` is the register index. `*reg`
indicates a list of registers.
### AllocTensor
```
AllocTenosr $1, ndim, *shape, dtype ; %1 = AllocTensor(ndim, *shape, dtype)
```
Allocate a tensor object and stores to register `$1`.
### AllocDatatype
```
AllocDatatype $1, tag, num_fields, *regs ; %1 = AllocDatatype(tag, num_fields,
*regs)
```
Allocate a datatype object and stores to register `$1`. `*regs` is a list of
registers containing fields in the datatype.
### AllocClosure
```
AllocClosure $1, func_index, num_freevars, *regs ; %1 =
AllocClosure(func_index, num_freevars, *regs)
```
Allocate a closure object, where there are `num_freevars` in `*regs`.
### LoadConst
```
LoadConst $1, const_index ; %1 = LoadConst(const_index)
```
Load the constant at `const_index` from the constant pool.
### Mov
```
Mov $2, $1 ; %2 = Mov(%1)
```
Create a reference to VMObject in register `$1` and stores it in `$2`.
Note: No data copy happens in this instruction. The real data copy should be a
PackedFunction.
### Phi
```
Phi $3, $1, $2 ; %3 = Phi(%1, %2)
```
Takes the `VMObject` either in `$1` or `$2` and stores in `$3`.
Note: This instruction requires VMObject in register `$1` and `$2` having the
same type, and only one of them should be valid during the runtime.
### Ret
```
Ret $1
```
Returns the register `$1`.
### GetField
```
GetField $2, $1, field_index ; %2 = GetField(%1, field_index)
```
Get the field at `field_index` in `$1`.
### If
```
If $1, true_offset, false_offset
```
Check if `$1` is true or false. If true relative jump by true_branch, else
relative jump by false_branch.
### Goto
```
Goto pc_offset
```
Relative unconditional jump by `pc_offset`.
### InvokePacked
```
InvokePacked packed_index, arity, output_size, *regs
```
Invoke the packed function denoted by `packed_index`
Note: Number of registers in `*regs` should be `arity + output_size` where
first `arity` registers are arguments and rest are output registers.
### Invoke
```
Invoke $1, func_index, arity, *regs ; %1 = Invoke(func_index, arity, *regs)
```
Invoke VM function at `func_index`
Note: Number of registers in `*regs` should be `arity`. Register `$1` will be a
reference to the VMObject in the return register.
### InvokeClosure
```
InvokeClosure $2, $1, arity, *regs ; %2 = InvokeClosure(%1, arity, *regs)
```
Invoke closure function in register `$2` and save the result into register `$2`.
Note: Register `$2` must be a `VMClosureObject`.
## Stack and State
In order to convert to register-based VM, we also need to adjust the data
structure in the VM. We assume there are infinite registers available, and each
function has its own register file. Each time an `Invoke` instruction is
executed, the runtime creates a new `VMFrame`. We pre-allocate max number of
registers used in the callee function (this number can be derived during
`VMCompile`) and assigns the arguments to the first `args` slots in the
registers.
```
struct VMFrame {
// Current function
size_t func_index;
// Pre-allocate max number of registers used in the functions
std::vector<VMObject> registers;
// Number of arguments
size_t args;
// Pointer into the current function's instructions.
const Instruction* code;
// Current program counter
size_t pc;
};
```
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2915