[dmlc/tvm] [RFC] Register Relay VM design (#2915)

Haichen Shen Wed, 27 Mar 2019 10:53:07 -0700

# Register VM Design
Current Relay VM RFC (#2810) proposes stack-based VM design which uses push and 
pop to maintain a unified stack. Though the design itself is simple to 
implement, it is cumbersome in tasks such as dataflow analysis and enforces 
certain orders in the execution.


I propose a register-based VM design as an alternative design. The main 
difference is that register-based VM uses registers to designate the operands 
and results instead of using the stack. Registers in the VM are virtual 
registers and each is a reference to a `VMObject`. We assume there are infinite 
registers so runtime doesn't need to worry about register spilling. Registers 
are in SSA form where its index is unique in each VMFunction. 

Calling convention in register VM is also simple. Each function has its own 
local register file. The first `k` registers in the register file of callee 
function are arguments passed in by caller function, and the VMObject in return 
register is allocated by callee function instead of caller function to avoid 
unnecessary memory copy. Caller function then assigns the reference to VMObject 
pointed by return register into its own register (see detail in `Invoke` 
instruction).

I summarize some pros and cons for register VM compared to stack VM.

Pros:
- Data dependency analysis is self-explaining as register reveals the data 
dependency. In comparison, stack VM requires you to simulate the program and 
recover the stack index in order to get data dependency. As a result, it'll be 
easier to implement a dataflow executor or out-of-order executor.
- If/else scope is easier to handle. Previously stack VM needs to move the 
objects to the right place in the stack given it enters if or else branch. Now 
register VM only needs a new `Phi` instruction to select the result from either 
branch.
- Tail recursion optimization should be easier since we only need to move the 
register to the correct slot in the register file.

Cons:
- There is some memory overhead to keep the empty slots for registers that are 
never used.
- Need to kill the register after its life cycle. This can be done by either 
annotating the life cycle of each register or inserting the kill instructions 
in the program during the life cycle analysis. Note that life cycle analysis 
pass is needed by both stack VM and register VM.

## Instructions
We modify the current stack VM instructions to use registers as operands and 
results.
Registers are designated by `$k` where `k` is the register index. `*reg` 
indicates a list of registers.
### AllocTensor
```
AllocTenosr $1, ndim, *shape, dtype  ; %1 = AllocTensor(ndim, *shape, dtype)
```
Allocate a tensor object and stores to register `$1`.
### AllocDatatype
```
AllocDatatype $1, tag, num_fields, *regs  ; %1 = AllocDatatype(tag, num_fields, 
*regs)
```
Allocate a datatype object and stores to register `$1`. `*regs` is a list of 
registers containing fields in the datatype.
### AllocClosure
```
AllocClosure $1, func_index, num_freevars, *regs  ; %1 = 
AllocClosure(func_index, num_freevars, *regs)
```
Allocate a closure object, where there are `num_freevars` in `*regs`.
### LoadConst
```
LoadConst $1, const_index  ; %1 = LoadConst(const_index)
```
Load the constant at `const_index` from the constant pool.

### Mov
```
Mov $2, $1  ; %2 = Mov(%1)
```
Create a reference to VMObject in register `$1` and stores it in `$2`.

Note: No data copy happens in this instruction. The real data copy should be a 
PackedFunction.
### Phi
```
Phi $3, $1, $2  ; %3 = Phi(%1, %2)
```
Takes the `VMObject` either in `$1` or `$2` and stores in `$3`.

Note: This instruction requires VMObject in register `$1` and `$2` having the 
same type, and only one of them should be valid during the runtime.
### Ret
```
Ret $1
```
Returns the register `$1`.
### GetField
```
GetField $2, $1, field_index  ; %2 = GetField(%1, field_index)
```
Get the field at `field_index` in `$1`.
### If
```
If $1, true_offset, false_offset
```
Check if `$1` is true or false. If true relative jump by true_branch, else 
relative jump by false_branch.
### Goto
```
Goto pc_offset
```
Relative unconditional jump by `pc_offset`.
### InvokePacked
```
InvokePacked packed_index, arity, output_size, *regs
```
Invoke the packed function denoted by `packed_index`

Note: Number of registers in `*regs` should be `arity + output_size` where 
first `arity` registers are arguments and rest are output registers.
### Invoke
```
Invoke $1, func_index, arity, *regs  ; %1 = Invoke(func_index, arity, *regs)
```
Invoke VM function at `func_index`

Note: Number of registers in `*regs` should be `arity`. Register `$1` will be a 
reference to the VMObject in the return register.
### InvokeClosure
```
InvokeClosure $2, $1, arity, *regs  ; %2 = InvokeClosure(%1, arity, *regs)
```
Invoke closure function in register `$2` and save the result into register `$2`.

Note: Register `$2` must be a `VMClosureObject`.

## Stack and State
In order to convert to register-based VM, we also need to adjust the data 
structure in the VM. We assume there are infinite registers available, and each 
function has its own register file. Each time an `Invoke` instruction is 
executed, the runtime creates a new `VMFrame`. We pre-allocate max number of 
registers used in the callee function (this number can be derived during 
`VMCompile`) and assigns the arguments to the first `args` slots in the 
registers.
```
struct VMFrame {
  // Current function
  size_t func_index;
  // Pre-allocate max number of registers used in the functions
  std::vector<VMObject> registers;
  // Number of arguments
  size_t args;
  // Pointer into the current function's instructions.
  const Instruction* code;
  // Current program counter
  size_t pc;
};
```

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2915

[dmlc/tvm] [RFC] Register Relay VM design (#2915)

Reply via email to