[tvm-rfcs] branch main updated: Module Based Model Runtime for AOT (#46)

manupa Tue, 15 Feb 2022 09:59:51 -0800

This is an automated email from the ASF dual-hosted git repository.

manupa pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm-rfcs.git



The following commit(s) were added to refs/heads/main by this push:
     new d9dd6eb  Module Based Model Runtime for AOT (#46)
d9dd6eb is described below

commit d9dd6eb5e522ff169fe3bf4f5b9c5c3e84d95145
Author: Andrew Reusch <[email protected]>
AuthorDate: Tue Feb 15 09:59:25 2022 -0800

    Module Based Model Runtime for AOT (#46)
    
    * Module Based Model Runtime for AOT.
    
    * Address manupa comments
    
    * address manupa commnets
    
    * address manupa comments
---
 rfcs/0046-module-based-model-runtime-for-aot.md | 363 ++++++++++++++++++++++++
 1 file changed, 363 insertions(+)

diff --git a/rfcs/0046-module-based-model-runtime-for-aot.md 
b/rfcs/0046-module-based-model-runtime-for-aot.md
new file mode 100644
index 0000000..ac71724
--- /dev/null
+++ b/rfcs/0046-module-based-model-runtime-for-aot.md
@@ -0,0 +1,363 @@
+# Module-based Model Runtime Interface for AOT
+
+- Feature Name: module_based_model_runtime_for_aot
+- Start Date: 2021-09-17
+- RFC PR: [apache/tvm-rfcs#0046](https://github.com/apache/tvm-rfcs/pull/0046)
+- GitHub Issue: [apache/tvm#0000](https://github.com/apache/tvm/issues/0000)
+
+# **Summary**
+
+This RFC describes a [Module-based Model Runtime
+interface](https://discuss.tvm.apache.org/t/discuss-module-based-model-runtime-interface/5025)
 for
+the [Ahead-of-Time 
Executor](https://discuss.tvm.apache.org/t/implementing-aot-in-tvm/9206), 
thereby
+enabling its use from the TVM C++ Runtime.
+
+# **Motivation**
+
+The microTVM project has made significant progress towards an Ahead-of-Time 
Executor for compiled
+Relay models. At the time of writing, it's now possible to codegen a TIR 
function which executes
+Relay models that have known shapes, don't have graph-level control flow, and 
execute only on the
+CPU device. Right now, the C runtime is the only such runtime environment 
which can interact with
+this generated code. However, significant interest exists in enabling the C++ 
runtime to use the
+Ahead-of-Time executor.
+
+# **Guide-level explanation**
+
+Users select the AOT executor at compile time through the traditional 
GraphExecutor compilation flow
+(e.g. `[tvm.relay.build](http://tvm.relay.build)`) by including 
`--executor=aot` in the Target
+[1]. The return value of `tvm.relay.build` in this case is an 
`AotExecutorFactory` Module
+object. Users instantiate the AOT executor via `AotExecutorFactory` as they do 
with `GraphExecutor`:
+
+```bash
+ir_mod = tvm.parser.fromtext("""\
+      #[version = "0.0.5"]
+      def @main(%a : Tensor[(1, 2), uint8], %b : Tensor[(1, 2), uint8]) {
+          %0 = %a + %b;
+          %0
+      }"""
+    )
+
+with PassConfig(opt_level=3):
+  factory : AotExecutorFactory = tvm.relay.build(
+       ir_mod, "llvm -executor=aot", module_name="my_mod")
+
+aot_executor : AotExecutor = factory["my_mod"](tvm.cpu(0))
+```
+
+`AotExecutor` supports the traditional Module-Based Model Runtime Interface 
and can be used as a
+user normally would `GraphExecutor`:
+
+```bash
+aot_executor.set_input("a", tvm.nd.array(np.ndarray([1, 2], dtype="uint8")))
+aot_executor.set_input("b", tvm.nd.array(np.ndarray([3, 5], dtype="uint8")))
+aot_exec.run()
+output = aot_exec.get_output(0)
+assert output.asnumpy() == np.ndarray([5, 7], dtype="uint8")
+```
+
+[1] NOTE: The target string is not the final place this customization should 
be made. However, it's
+been the place where we've been putting runtime-related stuff. A separate RFC 
will split the Target
+string into Target options (which affect tuning) and runtime options.
+
+# **Reference-level explanation**
+
+Already committed to TVM is the AotExecutorCodegen. This module produces a TIR 
top-level function
+which invokes the Relay operators (implemented in TIR) in a correct order. An 
example is given
+below:
+
+```bash
+PrimFunc([input1, input2, output]) attrs={"global_symbol": 
"tvmgen_my_mod_run_model", "runner_function": (bool)1} {
+  // attr [(nullptr)] device_id = 0
+  // attr [(nullptr)] device_type = 1
+  tir.tvm_call_packed("tvmgen_my_mod_fused_add", input1, input2, output)
+}
+```
+
+The AotExecutor is a runtime wrapper component around this function that needs 
to accomplish the
+following to meet Module-based Model Runtime Interface:
+
+1. Allocate input and output tensors as defined in the `run_model` function 
using the correct Device
+   API.
+2. Provide a mapping from relay parameter name to positional argument.
+3. Invoke the generated TIR function and provide profiling.
+
+In the future, AOT will support heterogenous execution e.g. allocating tensors 
and driving inference
+on `DLDevice` other than `kDLCPU`. Note that to align this code generator with 
the sensitive
+environment present on a bare-metal microcontroller, the TIR top-level 
function intentionally
+presumes that the input and output tensors already live on the `DLDevice`. 
This allows the user to
+decide whether the AotExecutor generic runtime component will be used to fill 
input tensors or
+whether they prefer to handle this in their application (or e.g. through 
background DMA).
+
+### Compiler ↔ Runtime Metadata
+
+In order to implement (1) and (2) above, additional metadata about the 
`run_model` function needs to
+be communicated from Compiler to Runtime:
+
+- The mapping between Relay parameter name and TIR argument position
+- The number of inputs and outputs
+- The type of each parameter
+- Information sufficient to choose a Device API to allocate memory for that 
data.
+
+At present, Metadata is passed from Compiler to Runtime in several different 
ways:
+
+1. Constant DLTensor can be bundled with code and supplied to 
`runtime::Module` via
+   `runtime::MetadataModule`
+2. Many non-DSO-exportable backends (`cuda`, `hexagon`, `metal`, `opencl`, 
`sdaccel`, `rocm`,
+   `vulkan`) have adopted the convention of including a
+   
[`runtime::FunctionInfo`](https://github.com/apache/tvm/blob/main/src/runtime/meta_data.h#L106)
+   (NOTE: distinct from `tvm::relay::transform::FunctionInfo`) in their 
serialization:
+
+    ```bash
+    /*! \brief function information needed by device */
+    struct FunctionInfo {
+      std::string name;
+      std::vector<DLDataType> arg_types;
+      std::vector<std::string> launch_param_tags;
+    }
+    ```
+
+3. AotExecutorCodegen and GraphExecutorCodegen have adopted the practice of 
producing the
+   graph-level
+   
[`tvm::relay::backend::ExecutorCodegenMetadata`](https://github.com/apache/tvm/blob/c3ace209253507dcb109c12ab8b82575fc668862/src/relay/backend/utils.h#L89):
+
+    ```bash
+    /*!
+     * \brief Structure that can be optionally used by the executor codegen
+     */
+    class MetadataNode : public Object {
+     public:
+      /*! \brief input information for the main function */
+      Array<String> inputs;
+      /*! \brief number of outputs of the main function */
+      int num_outputs = 1;
+      /*! \brief the executor to be used to run the model */
+      String executor = kTvmExecutorGraph;
+
+      String mod_name = "";
+    }
+    ```
+
+4. The recent AOTExecutor implementation has created 
`tvm::relay::transform::FunctionInfo` which
+   communicates statistics about memory usage and I/O operation for each TIR 
operator and aggregate
+   statistics for the top-level AOT function:
+
+    ```bash
+    struct FunctionInfoNode : public Object {
+      Map<Target, Integer> workspace_sizes;
+      Map<Target, Integer> io_sizes;
+      Map<Target, Integer> constant_sizes;
+      Map<Target, tir::PrimFunc> tir_primfuncs;
+      Map<Target, Function> relay_primfuncs;
+    }
+    ```
+
+
+Some duplication of information is already present. Likely this is due in part 
to the existing
+middle-end compiler design, in which a separate `IRModule` is produced for 
each backend. This means
+that any metadata which requires whole-program analysis must be computed by an 
upstream TIR pass and
+stored on the function whose code-generator needs it, rather than centrally.
+
+Another factor may be: since `runtime::Module` are responsible for their own 
serialization,
+and passing `tvm::Node` across `PackedFunc` requires a cast, the lack of a 
centralized facility for
+`runtime::Modules` to obtain module-level Metadata has led backend authors to 
roll their own. This
+pattern means that it's very difficult to assess the full scope of metadata 
handed to the runtime,
+particularly across all backends.
+
+This RFC argues for creating a centralized `tvm::runtime::metadata::Metadata` 
struct which contains
+all Metadata consumed at runtime. Unifying runtime Metadata allows us to 
reduce the amount of
+serialization logic and eliminate duplication of metadata. The current 
compiler design stores
+centrally-produced Metadata in a side channel, but this could be improved in 
future RFCs e.g. should
+we move away from splitting IRModules per backend.
+
+This RFC argues for a restructuring of the way we export Metadata through the 
following steps:
+
+1. Rename `runtime::MetadataModule` to `runtime::ConstLoaderModule` to 
disambiguate the two and make
+   its purpose in life clearer.
+2. Expand the function metadata in the existing 
`relay::backend::ExecutorCodegenMetadata` to parity with
+   `runtime::FunctionInfo`, plus include `_sizes` from 
`tvm::relay::transform::FunctionInfoNode` and
+   the required `shape` and `dtype` information from the beginning of this 
section.
+3. Introduce `ModelMetadataModule` to contain this information for use with 
the C++ runtime.
+
+    ```bash
+    class ModelMetadataModule {
+      virtual GetFunction(const std::string& name, ObjectPtr<Object>& 
sptr_to_self) {
+        if (name == "get_model_metadata") {
+           return PackedFunc([](TVMArgs args, TVMRetValue* rv) {
+              *rv = ModelMetadata(metadata_);
+           });
+        } else {
+          return PackedFunc();
+        }
+      }
+
+      const struct ModelMetadata* metadata_;
+    };
+    ```
+
+4. Introduce an optional implementation for the C runtime.
+5. Export runtime::Metadata to Model Library Format.
+
+The new proposed definition of `runtime::Metadata` is as follows.  NOTE that 
this is a C definition
+because it will be made available both the C and C++ runtimes. A C++ wrapper 
will be written.
+
+```bash
+struct ParameterInfo {
+  const char* relay_name_hint;
+  const char* tir_name_hint;
+  int64_t* shape;
+  int64_t ndim;
+  DLDataType dtype;
+  TargetDevice target_device;  // NOTE: future addition; not covered in this 
RFC.
+};
+
+struct FunctionInfo {
+  const char* function_name;
+  struct ParameterInfo* params;
+  int num_inputs;
+  int num_outputs;
+  int64_t workspace_size_bytes;
+  int64_t io_size_bytes;
+  int64_t constant_size_bytes;
+};
+
+typedef struct Metadata {
+  int version;
+  struct FunctionInfo* functions;
+  const char* module_name;
+};
+```
+
+### Internal workings of AotExecutor (`--runtime=c++ --interface-api=packed`)
+
+Given the above, we can now sketch out the way AotExecutor should behave (for 
C++ runtime).
+
+Module initialization will:
+
+1. Load the `ModelMetadata` using `get_model_metadata` PackedFunc.
+2. Allocate space for the parameters to `tvmgen_<model_name>_run_model`.
+3. Lookup and load any linked parameters using the `--link-params` mechanism.
+
+- `set_input`, `get_input`, `get_output` all work as they do in 
`GraphExecutor`.
+- `run` assembles `TVMArgs` containing inputs + outputs and invokes 
`tvmgen_<model_name>_run_model`.
+- `time_evaluator` is implemented in the same way as it is in `GraphExecutor`. 
Timing `run_model` is
+  done using the CPU timer.
+
+### Internal workings of AotExecutor (`--runtime=c --interface-api=packed`)
+
+The C runtime version works in a very similar way with C accessor functions 
for the `ModelMetadata`.
+
+### No AotExecutor implementation planned (`--runtime=c --interface-api=c`)
+
+When `-interface-api=c` is present in the Target string, the `run_model` 
function no longer accepts
+the PackedFunc interface and instead accepts `arg_values` directly as 
positional args:
+
+```bash
+TVM_DLL int32_t tvmgen_default_run_model(void* arg0, void* arg1, void* arg2) {
+  void* input = arg0;
+  void* input1 = arg1;
+  void* output = arg2;
+  (void)tvmgen_default_fused_multiply(input, input1, output);
+  return 0;
+}
+```
+
+Additional work is underway to wrap this in a firmware-friendly interface. A 
core design goal of
+this interface is to offload all memory management tasks to the calling code 
to facilitate
+integration with bare-metal embedded devices.
+
+Therefore, it would go against the goals of the C interface to introduce a 
generic runtime wrapper
+compatible with PackedFunc calling convention. It may be possible to do so in 
the future, but it
+would be great to motivate such an implementation with rationale more related 
to the embedded
+runtime setting.
+
+### Operator Calling Convention
+
+TVM uses 3 internal calling conventions:
+
+1. `call_packed` - the traditional calling convention used in the C++ runtime
+2. `call_cpacked` - similar to `call_packed`, but TVM presumes a symbol is 
linked into the binary
+   containing that function name (e.g. `TVMBackendGetFuncFromEnv` is not used 
to lookup the
+   PackedFunc)
+3. `unpacked` - used with microTVM to avoid overhead of PackedFunc calls in 
statically-linked
+   binaries. See [AOT optimisations for Embedded Targets
+   
RFC](https://discuss.tvm.apache.org/t/rfc-utvm-aot-optimisations-for-embedded-targets/9849).
+
+The AOT `run_func` can use a different calling convention externally (e.g. 
`--interface-api`) than
+that used internally with Implemented Operators (`--unpacked-args`). However, 
there are some
+circumstances under which not all choices can be used:
+
+- When targeting the C++ runtime: `call_packed` must be used when 
non-DSO-exportable modules exist;
+  otherwise `call_cpacked` may be used. `unpacked` may not be used with AOT 
Executor as the
+  interface has not settled.
+- When targeting the C runtime: any calling convention may be selected for 
either the interface API
+  or the operator calling convention. However, when using `--interface-api=c` 
(e.g. `unpacked`
+  `run_func` calling convention), you must also use the `unpacked` calling 
convention with
+  Implemented Operators.
+
+# **Drawbacks**
+
+Why should we  *not*  do this?
+
+- This requires quite a bit of rework of the Metadata-passing mechanism, with 
potential for breakage.
+- It also introduces yet another Executor to the runtime to maintain.
+- It may introduce additional constraints on the `<C-runtime, C-interface>` 
implementation, which
+  may make it more difficult to make progress on microTVM.
+
+# **Rationale and alternatives**
+
+- Why is this design the best in the space of possible designs?
+- What other designs have been considered and what is the rationale for not 
choosing them?
+- What is the impact of not doing this?
+
+This RFC doesn't address the question of "why add an AOT executor?" The RFC 
which added it in the
+first place is a better location to look for rationale to motivate that. In 
general, not following
+through with this RFC would relegate the AOT executor to a C-runtime-only 
component. There is
+significant interest in AOT from C++ runtime users, and maintaining 
compatibility with both
+increases the chances that AOT executor will support all TVM runtime features.
+
+The controversial pieces of this RFC addressed are as follows:
+
+### Should we maintain a unified approach to code-generating the AOT executor?
+
+An alternative approach could introduce an additional e.g. 
`aot_cpp_executor_codegen.cc` and create
+a third pathway (in the Graph/AOT build flow). Doing this allows us to 
implement runtime-specific
+compiler primitives, which may simplify both pipelines. However, soon those 
pipelines will grow more
+complicated as features are added to leverage AOT, such as Unified Static 
Memory Planning. The
+burden of double-maintenance of those features outweighs the advantage of a 
simplified
+implementation. It also makes it easier for newcomers to understand the 
compiler.
+
+### Should we attempt to unify the Metadata?
+
+Metadata could be left in the scattered form it is now. It may be that the 
implementation of this
+RFC prioritizes expansion of `ModelMetadata` over propagating it to the 
various non-DSO-exportable
+`runtime::Module`. Ultimately though, maintaining separate function-level 
metadata adds confusion
+and code bloat. It also makes it harder to reason about the compiler as a 
whole. For these reasons,
+this RFC advocates for centralizing the Metadata.
+
+# **Prior art**
+
+There is no known prior art of a C++-runtime-compatible AOT implementation.
+
+# **Unresolved questions**
+
+- Who will we break if we unify Model metadata?
+- Will this play nicely with the VM compilation flow when it is unified?
+- How will TargetDevice come in to play here?
+
+# **Future possibilities**
+
+Not covered in this RFC, but particularly useful with the C++ runtime, is 
heterogenous execution. In
+the present PoC, AotExecutor will CHECK-fail if a non-cpu device is given. A 
future implementation
+will annotate the parameters with one of:
+
+- A `device_type` — in which case mapping from `device_type` to `tvm::Device` 
will be done in the
+  same way as the `GraphExecutor`
+- A `target_device` — in which case a new mapping will be defined
+
+Aside from that, the larger unresolved bit which makes it difficult to add 
heterogenous execution is:
+
+- How should AOT codegen invoke the Device API?
+
+Before this question can be answered, some progress needs to be made on the [C 
device
+API](https://discuss.tvm.apache.org/t/pre-rfc-c-device-api/10874) and we need 
to define TIR
+bindings.

[tvm-rfcs] branch main updated: Module Based Model Runtime for AOT (#46)

Reply via email to