areusch commented on a change in pull request #46: URL: https://github.com/apache/tvm-rfcs/pull/46#discussion_r802025838
########## File path: rfcs/0046-module-based-model-runtime-for-aot.md ########## @@ -0,0 +1,348 @@ +# Module-based Model Runtime Interface for AOT + +- Feature Name: module_based_model_runtime_for_aot +- Start Date: 2021-09-17 +- RFC PR: [apache/tvm-rfcs#0046](https://github.com/apache/tvm-rfcs/pull/0046) +- GitHub Issue: [apache/tvm#0000](https://github.com/apache/tvm/issues/0000) + +# **Summary** + +This RFC describes a [Module-based Model Runtime +interface](https://discuss.tvm.apache.org/t/discuss-module-based-model-runtime-interface/5025) for +the [Ahead-of-Time Executor](https://discuss.tvm.apache.org/t/implementing-aot-in-tvm/9206), thereby +enabling its use from the TVM C++ Runtime. + +# **Motivation** + +The microTVM project has made significant progress towards an Ahead-of-Time Executor for compiled +Relay models. At the time of writing, it's now possible to codegen a TIR function which executes +Relay models that have known shapes, don't have graph-level control flow, and execute only on the +CPU device. Right now, the C runtime is the only such runtime environment which can interact with +this generated code. However, significant interest exists in enabling the C++ runtime to use the +Ahead-of-Time executor. + +# **Guide-level explanation** + +Users select the AOT executor at compile time through the traditional GraphExecutor compilation flow +(e.g. `[tvm.relay.build](http://tvm.relay.build)`) by including `--executor=aot` in the Target +[1]. The return value of `tvm.relay.build` in this case is an `AotExecutorFactory` Module +object. Users instantiate the AOT executor via `AotExecutorFactory` as they do with `GraphExecutor`: + +```bash +ir_mod = tvm.parser.fromtext("""\ + #[version = "0.0.5"] + def @main(%a : Tensor[(1, 2), uint8], %b : Tensor[(1, 2), uint8]) { + %0 = %a + %b; + %0 + }""" + ) + +with PassConfig(opt_level=3): + factory : AotExecutorFactory = tvm.relay.build( + ir_mod, "llvm -executor=aot", module_name="my_mod") + +aot_executor : AotExecutor = factory["my_mod"](tvm.cpu(0)) +``` + +`AotExecutor` supports the traditional Module-Based Model Runtime Interface and can be used as a +user normally would `GraphExecutor`: + +```bash +aot_executor.set_input("a", tvm.nd.array(np.ndarray([1, 2], dtype="uint8"))) +aot_executor.set_input("b", tvm.nd.array(np.ndarray([3, 5], dtype="uint8"))) +aot_exec.run() +output = aot_exec.get_output(0) +assert output.asnumpy() == np.ndarray([5, 7], dtype="uint8") +``` + +[1] NOTE: The target string is not the final place this customization should be made. However, it's +been the place where we've been putting runtime-related stuff. A separate RFC will split the Target +string into Target options (which affect tuning) and runtime options. + +# **Reference-level explanation** + +Already committed to TVM is the AotExecutorCodegen. This module produces a TIR top-level function +which invokes the Relay operators (implemented in TIR) in a correct order. An example is given +below: + +```bash +PrimFunc([input1, input2, output]) attrs={"global_symbol": "tvmgen_my_mod_run_model", "runner_function": (bool)1} { + // attr [(nullptr)] device_id = 0 + // attr [(nullptr)] device_type = 1 + tir.tvm_call_packed("tvmgen_my_mod_fused_add", input1, input2, output) +} +``` + +The AotExecutor then needs to accomplish the following to meet Module-based Model Runtime Interface: + +1. Allocate input and output tensors as defined in the `run_model` function using the correct Device Review comment: > I suppose the proposal here is we create a runtime wrapper around it -- which also works but with cons of not exposing these allocations to the core compiler for further optimization. I actually think we should defer the question of how the input tensors are loaded and consumed to the runtime. You're right that AOTExecutor here is intended to be a generic runtime wrapper when there are enough resources on the system to handle this at runtime. However, consider a microTVM use case where DMA is used to copy data from e.g. a camera into an SRAM dedicated to accelerator usage. In this case, TVM should really do as much as possible to stay out of the way--the copy operation is application- or at least SoC- specific. TVM should provide pointer and sizing information so this can be handled separately. That argues for exposing some type of `get_input_tensor` function as you're mentioning here as a first-class citizen of MBMR. I think we should take that up as we move on the C Device API and USMP integrations. > However, Im curious to know whether it would just easier to create a copy in the main body to tir.allocate node that get translated to a device copy -- which I think has the same effect. I think I see what you're raising here--we have to explicitly inject the input and output tensors into USMP if they are not modeled by a `tir.allocate` somewhere. We could model them without specially injecting them by wrapping the TIR top-level function in a MBMR-compatible TIR function which contains `tir.allocate` nodes, IIUC. I'm not sure how we get away from needing to do this at present, though--for `DLDevice(kDLCPU, 0)`, no such copy is needed, but we would then still need the `tir.allocate` nodes to make the input tensors visible to USMP. Additionally, I imagine in the double-buffered input use case, the user would need to somehow communicate that space for two input buffers is needed. Not sure how to do that now outside of downsizing a memory pool. > I would propose being explicit here that this is a runtime wrapper. Possibly, would it be possible to add a few notes that why/ why we should not do it the codegen of main ? Added. ########## File path: rfcs/0046-module-based-model-runtime-for-aot.md ########## @@ -0,0 +1,348 @@ +# Module-based Model Runtime Interface for AOT + +- Feature Name: module_based_model_runtime_for_aot +- Start Date: 2021-09-17 +- RFC PR: [apache/tvm-rfcs#0046](https://github.com/apache/tvm-rfcs/pull/0046) +- GitHub Issue: [apache/tvm#0000](https://github.com/apache/tvm/issues/0000) + +# **Summary** + +This RFC describes a [Module-based Model Runtime +interface](https://discuss.tvm.apache.org/t/discuss-module-based-model-runtime-interface/5025) for +the [Ahead-of-Time Executor](https://discuss.tvm.apache.org/t/implementing-aot-in-tvm/9206), thereby +enabling its use from the TVM C++ Runtime. + +# **Motivation** + +The microTVM project has made significant progress towards an Ahead-of-Time Executor for compiled +Relay models. At the time of writing, it's now possible to codegen a TIR function which executes +Relay models that have known shapes, don't have graph-level control flow, and execute only on the +CPU device. Right now, the C runtime is the only such runtime environment which can interact with +this generated code. However, significant interest exists in enabling the C++ runtime to use the +Ahead-of-Time executor. + +# **Guide-level explanation** + +Users select the AOT executor at compile time through the traditional GraphExecutor compilation flow +(e.g. `[tvm.relay.build](http://tvm.relay.build)`) by including `--executor=aot` in the Target +[1]. The return value of `tvm.relay.build` in this case is an `AotExecutorFactory` Module +object. Users instantiate the AOT executor via `AotExecutorFactory` as they do with `GraphExecutor`: + +```bash +ir_mod = tvm.parser.fromtext("""\ + #[version = "0.0.5"] + def @main(%a : Tensor[(1, 2), uint8], %b : Tensor[(1, 2), uint8]) { + %0 = %a + %b; + %0 + }""" + ) + +with PassConfig(opt_level=3): + factory : AotExecutorFactory = tvm.relay.build( + ir_mod, "llvm -executor=aot", module_name="my_mod") + +aot_executor : AotExecutor = factory["my_mod"](tvm.cpu(0)) +``` + +`AotExecutor` supports the traditional Module-Based Model Runtime Interface and can be used as a +user normally would `GraphExecutor`: + +```bash +aot_executor.set_input("a", tvm.nd.array(np.ndarray([1, 2], dtype="uint8"))) +aot_executor.set_input("b", tvm.nd.array(np.ndarray([3, 5], dtype="uint8"))) +aot_exec.run() +output = aot_exec.get_output(0) +assert output.asnumpy() == np.ndarray([5, 7], dtype="uint8") +``` + +[1] NOTE: The target string is not the final place this customization should be made. However, it's +been the place where we've been putting runtime-related stuff. A separate RFC will split the Target +string into Target options (which affect tuning) and runtime options. + +# **Reference-level explanation** + +Already committed to TVM is the AotExecutorCodegen. This module produces a TIR top-level function +which invokes the Relay operators (implemented in TIR) in a correct order. An example is given +below: + +```bash +PrimFunc([input1, input2, output]) attrs={"global_symbol": "tvmgen_my_mod_run_model", "runner_function": (bool)1} { + // attr [(nullptr)] device_id = 0 + // attr [(nullptr)] device_type = 1 + tir.tvm_call_packed("tvmgen_my_mod_fused_add", input1, input2, output) +} +``` + +The AotExecutor then needs to accomplish the following to meet Module-based Model Runtime Interface: + +1. Allocate input and output tensors as defined in the `run_model` function using the correct Device + API. +2. Provide a mapping from relay parameter name to positional argument. +3. Invoke the generated TIR function and provide profiling. + +### Compiler ↔ Runtime Metadata + +In order to implement (1) and (2) above, additional metadata about the `run_model` function needs to +be communicated from Compiler to Runtime: + +- The mapping between Relay parameter name and TIR argument position +- The number of inputs and outputs +- The type of each parameter +- Information sufficient to choose a Device API to allocate memory for that data. + +At present, Metadata is passed from Compiler to Runtime in several different ways: + +1. Constant DLTensor can be bundled with code and supplied to `runtime::Module` via + `runtime::MetadataModule` +2. Many non-DSO-exportable backends (`cuda`, `hexagon`, `metal`, `opencl`, `sdaccel`, `rocm`, + `vulkan`) have adopted the convention of including a + [1runtime::FunctionInfo`](https://github.com/apache/tvm/blob/main/src/runtime/meta_data.h#L106) + (NOTE: distinct from `tvm::relay::transform::FunctionInfo`) in their serialization: + + ```bash + /*! \brief function information needed by device */ + struct FunctionInfo { + std::string name; + std::vector<DLDataType> arg_types; + std::vector<std::string> launch_param_tags; + } + ``` + +3. AotExecutorCodegen and GraphExecutorCodegen have adopted the practice of producing the + graph-level + [`runtime::MetadataNode`](https://github.com/apache/tvm/blob/main/src/runtime/meta_data.h#L55): + + ```bash + /*! + * \brief Structure that can be optionally used by the executor codegen + */ + class MetadataNode : public Object { + public: + /*! \brief input information for the main function */ + Array<String> inputs; + /*! \brief number of outputs of the main function */ + int num_outputs = 1; + /*! \brief the executor to be used to run the model */ + String executor = kTvmExecutorGraph; + + String mod_name = ""; + } + ``` + +4. The recent AOTExecutor implementation has created `tvm::relay::transform::FunctionInfo` which + communicates statistics about memory usage and I/O operation for each TIR operator and aggregate + statistics for the top-level AOT function: + + ```bash + struct FunctionInfoNode : public Object { + Map<Target, Integer> workspace_sizes; + Map<Target, Integer> io_sizes; + Map<Target, Integer> constant_sizes; + Map<Target, tir::PrimFunc> tir_primfuncs; + Map<Target, Function> relay_primfuncs; + } + ``` + + +Some duplication of information is already present. Likely this is due in part to the existing +middle-end compiler design, in which a separate `IRModule` is produced for each backend. Another +factor may be: since `runtime::Module` are responsible for their own serialization, and passing +`Node` across `PackedFunc` requires a cast, the lack of a centralized facility for +`runtime::Modules` to obtain module-level Metadata has led backend authors to roll their own. This +pattern means that it's very difficult to assess the full scope of metadata handed to the runtime, +particularly across all backends. + +Work is currently ongoing to unify the pre-codegen `IRModule` into a single instance. After this +work is completed, it will be much easier to produce a centralized module-level Metadata. This RFC +argues for the expansion of `runtime::MetadataNode` in the following ways: Review comment: ah i see, changed the wording. i guess i was trying to say that the name MetadataNode should cover a more complex thing, but in practice you're right that this proposes a restructuring. ########## File path: rfcs/0046-module-based-model-runtime-for-aot.md ########## @@ -0,0 +1,348 @@ +# Module-based Model Runtime Interface for AOT + +- Feature Name: module_based_model_runtime_for_aot +- Start Date: 2021-09-17 +- RFC PR: [apache/tvm-rfcs#0046](https://github.com/apache/tvm-rfcs/pull/0046) +- GitHub Issue: [apache/tvm#0000](https://github.com/apache/tvm/issues/0000) + +# **Summary** + +This RFC describes a [Module-based Model Runtime +interface](https://discuss.tvm.apache.org/t/discuss-module-based-model-runtime-interface/5025) for +the [Ahead-of-Time Executor](https://discuss.tvm.apache.org/t/implementing-aot-in-tvm/9206), thereby +enabling its use from the TVM C++ Runtime. + +# **Motivation** + +The microTVM project has made significant progress towards an Ahead-of-Time Executor for compiled +Relay models. At the time of writing, it's now possible to codegen a TIR function which executes +Relay models that have known shapes, don't have graph-level control flow, and execute only on the +CPU device. Right now, the C runtime is the only such runtime environment which can interact with +this generated code. However, significant interest exists in enabling the C++ runtime to use the +Ahead-of-Time executor. + +# **Guide-level explanation** + +Users select the AOT executor at compile time through the traditional GraphExecutor compilation flow +(e.g. `[tvm.relay.build](http://tvm.relay.build)`) by including `--executor=aot` in the Target +[1]. The return value of `tvm.relay.build` in this case is an `AotExecutorFactory` Module +object. Users instantiate the AOT executor via `AotExecutorFactory` as they do with `GraphExecutor`: + +```bash +ir_mod = tvm.parser.fromtext("""\ + #[version = "0.0.5"] + def @main(%a : Tensor[(1, 2), uint8], %b : Tensor[(1, 2), uint8]) { + %0 = %a + %b; + %0 + }""" + ) + +with PassConfig(opt_level=3): + factory : AotExecutorFactory = tvm.relay.build( + ir_mod, "llvm -executor=aot", module_name="my_mod") + +aot_executor : AotExecutor = factory["my_mod"](tvm.cpu(0)) +``` + +`AotExecutor` supports the traditional Module-Based Model Runtime Interface and can be used as a +user normally would `GraphExecutor`: + +```bash +aot_executor.set_input("a", tvm.nd.array(np.ndarray([1, 2], dtype="uint8"))) +aot_executor.set_input("b", tvm.nd.array(np.ndarray([3, 5], dtype="uint8"))) +aot_exec.run() +output = aot_exec.get_output(0) +assert output.asnumpy() == np.ndarray([5, 7], dtype="uint8") +``` + +[1] NOTE: The target string is not the final place this customization should be made. However, it's +been the place where we've been putting runtime-related stuff. A separate RFC will split the Target +string into Target options (which affect tuning) and runtime options. + +# **Reference-level explanation** + +Already committed to TVM is the AotExecutorCodegen. This module produces a TIR top-level function +which invokes the Relay operators (implemented in TIR) in a correct order. An example is given +below: + +```bash +PrimFunc([input1, input2, output]) attrs={"global_symbol": "tvmgen_my_mod_run_model", "runner_function": (bool)1} { + // attr [(nullptr)] device_id = 0 + // attr [(nullptr)] device_type = 1 + tir.tvm_call_packed("tvmgen_my_mod_fused_add", input1, input2, output) +} +``` + +The AotExecutor then needs to accomplish the following to meet Module-based Model Runtime Interface: + +1. Allocate input and output tensors as defined in the `run_model` function using the correct Device + API. +2. Provide a mapping from relay parameter name to positional argument. +3. Invoke the generated TIR function and provide profiling. + +### Compiler ↔ Runtime Metadata + +In order to implement (1) and (2) above, additional metadata about the `run_model` function needs to +be communicated from Compiler to Runtime: + +- The mapping between Relay parameter name and TIR argument position +- The number of inputs and outputs +- The type of each parameter +- Information sufficient to choose a Device API to allocate memory for that data. + +At present, Metadata is passed from Compiler to Runtime in several different ways: + +1. Constant DLTensor can be bundled with code and supplied to `runtime::Module` via + `runtime::MetadataModule` +2. Many non-DSO-exportable backends (`cuda`, `hexagon`, `metal`, `opencl`, `sdaccel`, `rocm`, + `vulkan`) have adopted the convention of including a + [1runtime::FunctionInfo`](https://github.com/apache/tvm/blob/main/src/runtime/meta_data.h#L106) + (NOTE: distinct from `tvm::relay::transform::FunctionInfo`) in their serialization: + + ```bash + /*! \brief function information needed by device */ + struct FunctionInfo { + std::string name; + std::vector<DLDataType> arg_types; + std::vector<std::string> launch_param_tags; + } + ``` + +3. AotExecutorCodegen and GraphExecutorCodegen have adopted the practice of producing the + graph-level + [`runtime::MetadataNode`](https://github.com/apache/tvm/blob/main/src/runtime/meta_data.h#L55): + + ```bash + /*! + * \brief Structure that can be optionally used by the executor codegen + */ + class MetadataNode : public Object { + public: + /*! \brief input information for the main function */ + Array<String> inputs; + /*! \brief number of outputs of the main function */ + int num_outputs = 1; + /*! \brief the executor to be used to run the model */ + String executor = kTvmExecutorGraph; + + String mod_name = ""; + } + ``` + +4. The recent AOTExecutor implementation has created `tvm::relay::transform::FunctionInfo` which + communicates statistics about memory usage and I/O operation for each TIR operator and aggregate + statistics for the top-level AOT function: + + ```bash + struct FunctionInfoNode : public Object { + Map<Target, Integer> workspace_sizes; + Map<Target, Integer> io_sizes; + Map<Target, Integer> constant_sizes; + Map<Target, tir::PrimFunc> tir_primfuncs; + Map<Target, Function> relay_primfuncs; + } + ``` + + +Some duplication of information is already present. Likely this is due in part to the existing +middle-end compiler design, in which a separate `IRModule` is produced for each backend. Another Review comment: hmm, i'm not quite sure i follow you here. I'm happy to add a reference to the Artifact proposal, but I'm not sure it's quite exactly what I'm stating here. Here, what I mean is that the TIR-to-Runtime interface is `IRModule -> (tree of runtime.Module)`. The existing MetadataModule (which is proposed to rename to ConstLoaderModule here) seems to have arisen out of a desire to build common infrastructure to handle loading DLTensor from `.text` in the C++ runtime. Here what I'm trying to point out is that since the TIR-to-Runtime interface provides no facility for the TIR-to-runtime processes to return metadata outside of the `runtime::Module`, this leads to duplication of information should it be required by the compiler in any way. For example, `constant_sizes` could be deduced from the DLTensor passed to ConstLoaderModule, but ConstLoaderModule is not supported by all runtimes and not the de-facto way to load constant data or metadata at runtime because it doesn't support enc oding structs and scalar values. You can also see some duplication in CudaModule. I think this proposal is attempting to start down the path of unifying these different methods of providing data generated during lowering to the runtime as Metadata. I think that's mainly covered here, but happy to add a reference to the Artifact thing if it helps, it just seems a bit orthogonal to me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
