areusch commented on a change in pull request #46:
URL: https://github.com/apache/tvm-rfcs/pull/46#discussion_r807136345



##########
File path: rfcs/0046-module-based-model-runtime-for-aot.md
##########
@@ -0,0 +1,348 @@
+# Module-based Model Runtime Interface for AOT
+
+- Feature Name: module_based_model_runtime_for_aot
+- Start Date: 2021-09-17
+- RFC PR: [apache/tvm-rfcs#0046](https://github.com/apache/tvm-rfcs/pull/0046)
+- GitHub Issue: [apache/tvm#0000](https://github.com/apache/tvm/issues/0000)
+
+# **Summary**
+
+This RFC describes a [Module-based Model Runtime
+interface](https://discuss.tvm.apache.org/t/discuss-module-based-model-runtime-interface/5025)
 for
+the [Ahead-of-Time 
Executor](https://discuss.tvm.apache.org/t/implementing-aot-in-tvm/9206), 
thereby
+enabling its use from the TVM C++ Runtime.
+
+# **Motivation**
+
+The microTVM project has made significant progress towards an Ahead-of-Time 
Executor for compiled
+Relay models. At the time of writing, it's now possible to codegen a TIR 
function which executes
+Relay models that have known shapes, don't have graph-level control flow, and 
execute only on the
+CPU device. Right now, the C runtime is the only such runtime environment 
which can interact with
+this generated code. However, significant interest exists in enabling the C++ 
runtime to use the
+Ahead-of-Time executor.
+
+# **Guide-level explanation**
+
+Users select the AOT executor at compile time through the traditional 
GraphExecutor compilation flow
+(e.g. `[tvm.relay.build](http://tvm.relay.build)`) by including 
`--executor=aot` in the Target
+[1]. The return value of `tvm.relay.build` in this case is an 
`AotExecutorFactory` Module
+object. Users instantiate the AOT executor via `AotExecutorFactory` as they do 
with `GraphExecutor`:
+
+```bash
+ir_mod = tvm.parser.fromtext("""\
+      #[version = "0.0.5"]
+      def @main(%a : Tensor[(1, 2), uint8], %b : Tensor[(1, 2), uint8]) {
+          %0 = %a + %b;
+          %0
+      }"""
+    )
+
+with PassConfig(opt_level=3):
+  factory : AotExecutorFactory = tvm.relay.build(
+       ir_mod, "llvm -executor=aot", module_name="my_mod")
+
+aot_executor : AotExecutor = factory["my_mod"](tvm.cpu(0))
+```
+
+`AotExecutor` supports the traditional Module-Based Model Runtime Interface 
and can be used as a
+user normally would `GraphExecutor`:
+
+```bash
+aot_executor.set_input("a", tvm.nd.array(np.ndarray([1, 2], dtype="uint8")))
+aot_executor.set_input("b", tvm.nd.array(np.ndarray([3, 5], dtype="uint8")))
+aot_exec.run()
+output = aot_exec.get_output(0)
+assert output.asnumpy() == np.ndarray([5, 7], dtype="uint8")
+```
+
+[1] NOTE: The target string is not the final place this customization should 
be made. However, it's
+been the place where we've been putting runtime-related stuff. A separate RFC 
will split the Target
+string into Target options (which affect tuning) and runtime options.
+
+# **Reference-level explanation**
+
+Already committed to TVM is the AotExecutorCodegen. This module produces a TIR 
top-level function
+which invokes the Relay operators (implemented in TIR) in a correct order. An 
example is given
+below:
+
+```bash
+PrimFunc([input1, input2, output]) attrs={"global_symbol": 
"tvmgen_my_mod_run_model", "runner_function": (bool)1} {
+  // attr [(nullptr)] device_id = 0
+  // attr [(nullptr)] device_type = 1
+  tir.tvm_call_packed("tvmgen_my_mod_fused_add", input1, input2, output)
+}
+```
+
+The AotExecutor then needs to accomplish the following to meet Module-based 
Model Runtime Interface:
+
+1. Allocate input and output tensors as defined in the `run_model` function 
using the correct Device
+   API.
+2. Provide a mapping from relay parameter name to positional argument.
+3. Invoke the generated TIR function and provide profiling.
+
+### Compiler ↔ Runtime Metadata
+
+In order to implement (1) and (2) above, additional metadata about the 
`run_model` function needs to
+be communicated from Compiler to Runtime:
+
+- The mapping between Relay parameter name and TIR argument position
+- The number of inputs and outputs
+- The type of each parameter
+- Information sufficient to choose a Device API to allocate memory for that 
data.
+
+At present, Metadata is passed from Compiler to Runtime in several different 
ways:
+
+1. Constant DLTensor can be bundled with code and supplied to 
`runtime::Module` via
+   `runtime::MetadataModule`
+2. Many non-DSO-exportable backends (`cuda`, `hexagon`, `metal`, `opencl`, 
`sdaccel`, `rocm`,
+   `vulkan`) have adopted the convention of including a
+   
[1runtime::FunctionInfo`](https://github.com/apache/tvm/blob/main/src/runtime/meta_data.h#L106)
+   (NOTE: distinct from `tvm::relay::transform::FunctionInfo`) in their 
serialization:
+
+    ```bash
+    /*! \brief function information needed by device */
+    struct FunctionInfo {
+      std::string name;
+      std::vector<DLDataType> arg_types;
+      std::vector<std::string> launch_param_tags;
+    }
+    ```
+
+3. AotExecutorCodegen and GraphExecutorCodegen have adopted the practice of 
producing the
+   graph-level
+   
[`runtime::MetadataNode`](https://github.com/apache/tvm/blob/main/src/runtime/meta_data.h#L55):
+
+    ```bash
+    /*!
+     * \brief Structure that can be optionally used by the executor codegen
+     */
+    class MetadataNode : public Object {
+     public:
+      /*! \brief input information for the main function */
+      Array<String> inputs;
+      /*! \brief number of outputs of the main function */
+      int num_outputs = 1;
+      /*! \brief the executor to be used to run the model */
+      String executor = kTvmExecutorGraph;
+
+      String mod_name = "";
+    }
+    ```
+
+4. The recent AOTExecutor implementation has created 
`tvm::relay::transform::FunctionInfo` which
+   communicates statistics about memory usage and I/O operation for each TIR 
operator and aggregate
+   statistics for the top-level AOT function:
+
+    ```bash
+    struct FunctionInfoNode : public Object {
+      Map<Target, Integer> workspace_sizes;
+      Map<Target, Integer> io_sizes;
+      Map<Target, Integer> constant_sizes;
+      Map<Target, tir::PrimFunc> tir_primfuncs;
+      Map<Target, Function> relay_primfuncs;
+    }
+    ```
+
+
+Some duplication of information is already present. Likely this is due in part 
to the existing
+middle-end compiler design, in which a separate `IRModule` is produced for 
each backend. Another

Review comment:
       per our offline discussion, clarified how the metadata is carried 
through the current compiler design, removed references to un-RFC'd design 
efforts and replaced with text to motivate them. also clarified some 
wording--ptal




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to