areusch commented on a change in pull request #46: URL: https://github.com/apache/tvm-rfcs/pull/46#discussion_r806455336
########## File path: rfcs/0046-module-based-model-runtime-for-aot.md ########## @@ -0,0 +1,348 @@ +# Module-based Model Runtime Interface for AOT + +- Feature Name: module_based_model_runtime_for_aot +- Start Date: 2021-09-17 +- RFC PR: [apache/tvm-rfcs#0046](https://github.com/apache/tvm-rfcs/pull/0046) +- GitHub Issue: [apache/tvm#0000](https://github.com/apache/tvm/issues/0000) + +# **Summary** + +This RFC describes a [Module-based Model Runtime +interface](https://discuss.tvm.apache.org/t/discuss-module-based-model-runtime-interface/5025) for +the [Ahead-of-Time Executor](https://discuss.tvm.apache.org/t/implementing-aot-in-tvm/9206), thereby +enabling its use from the TVM C++ Runtime. + +# **Motivation** + +The microTVM project has made significant progress towards an Ahead-of-Time Executor for compiled +Relay models. At the time of writing, it's now possible to codegen a TIR function which executes +Relay models that have known shapes, don't have graph-level control flow, and execute only on the +CPU device. Right now, the C runtime is the only such runtime environment which can interact with +this generated code. However, significant interest exists in enabling the C++ runtime to use the +Ahead-of-Time executor. + +# **Guide-level explanation** + +Users select the AOT executor at compile time through the traditional GraphExecutor compilation flow +(e.g. `[tvm.relay.build](http://tvm.relay.build)`) by including `--executor=aot` in the Target +[1]. The return value of `tvm.relay.build` in this case is an `AotExecutorFactory` Module +object. Users instantiate the AOT executor via `AotExecutorFactory` as they do with `GraphExecutor`: + +```bash +ir_mod = tvm.parser.fromtext("""\ + #[version = "0.0.5"] + def @main(%a : Tensor[(1, 2), uint8], %b : Tensor[(1, 2), uint8]) { + %0 = %a + %b; + %0 + }""" + ) + +with PassConfig(opt_level=3): + factory : AotExecutorFactory = tvm.relay.build( + ir_mod, "llvm -executor=aot", module_name="my_mod") + +aot_executor : AotExecutor = factory["my_mod"](tvm.cpu(0)) +``` + +`AotExecutor` supports the traditional Module-Based Model Runtime Interface and can be used as a +user normally would `GraphExecutor`: + +```bash +aot_executor.set_input("a", tvm.nd.array(np.ndarray([1, 2], dtype="uint8"))) +aot_executor.set_input("b", tvm.nd.array(np.ndarray([3, 5], dtype="uint8"))) +aot_exec.run() +output = aot_exec.get_output(0) +assert output.asnumpy() == np.ndarray([5, 7], dtype="uint8") +``` + +[1] NOTE: The target string is not the final place this customization should be made. However, it's +been the place where we've been putting runtime-related stuff. A separate RFC will split the Target +string into Target options (which affect tuning) and runtime options. + +# **Reference-level explanation** + +Already committed to TVM is the AotExecutorCodegen. This module produces a TIR top-level function +which invokes the Relay operators (implemented in TIR) in a correct order. An example is given +below: + +```bash +PrimFunc([input1, input2, output]) attrs={"global_symbol": "tvmgen_my_mod_run_model", "runner_function": (bool)1} { + // attr [(nullptr)] device_id = 0 + // attr [(nullptr)] device_type = 1 + tir.tvm_call_packed("tvmgen_my_mod_fused_add", input1, input2, output) +} +``` + +The AotExecutor then needs to accomplish the following to meet Module-based Model Runtime Interface: + +1. Allocate input and output tensors as defined in the `run_model` function using the correct Device Review comment: ah i see. yeah this makes sense. i'm wary of introducing too much complexity here particularly when user/platform intervention may be required to implement the double-buffer (e.g. if DMA is used to fill the buffer while the SoC is sleeping). it would be great to continue discussing this in a follow-on! ########## File path: rfcs/0046-module-based-model-runtime-for-aot.md ########## @@ -0,0 +1,348 @@ +# Module-based Model Runtime Interface for AOT + +- Feature Name: module_based_model_runtime_for_aot +- Start Date: 2021-09-17 +- RFC PR: [apache/tvm-rfcs#0046](https://github.com/apache/tvm-rfcs/pull/0046) +- GitHub Issue: [apache/tvm#0000](https://github.com/apache/tvm/issues/0000) + +# **Summary** + +This RFC describes a [Module-based Model Runtime +interface](https://discuss.tvm.apache.org/t/discuss-module-based-model-runtime-interface/5025) for +the [Ahead-of-Time Executor](https://discuss.tvm.apache.org/t/implementing-aot-in-tvm/9206), thereby +enabling its use from the TVM C++ Runtime. + +# **Motivation** + +The microTVM project has made significant progress towards an Ahead-of-Time Executor for compiled +Relay models. At the time of writing, it's now possible to codegen a TIR function which executes +Relay models that have known shapes, don't have graph-level control flow, and execute only on the +CPU device. Right now, the C runtime is the only such runtime environment which can interact with +this generated code. However, significant interest exists in enabling the C++ runtime to use the +Ahead-of-Time executor. + +# **Guide-level explanation** + +Users select the AOT executor at compile time through the traditional GraphExecutor compilation flow +(e.g. `[tvm.relay.build](http://tvm.relay.build)`) by including `--executor=aot` in the Target +[1]. The return value of `tvm.relay.build` in this case is an `AotExecutorFactory` Module +object. Users instantiate the AOT executor via `AotExecutorFactory` as they do with `GraphExecutor`: + +```bash +ir_mod = tvm.parser.fromtext("""\ + #[version = "0.0.5"] + def @main(%a : Tensor[(1, 2), uint8], %b : Tensor[(1, 2), uint8]) { + %0 = %a + %b; + %0 + }""" + ) + +with PassConfig(opt_level=3): + factory : AotExecutorFactory = tvm.relay.build( + ir_mod, "llvm -executor=aot", module_name="my_mod") + +aot_executor : AotExecutor = factory["my_mod"](tvm.cpu(0)) +``` + +`AotExecutor` supports the traditional Module-Based Model Runtime Interface and can be used as a +user normally would `GraphExecutor`: + +```bash +aot_executor.set_input("a", tvm.nd.array(np.ndarray([1, 2], dtype="uint8"))) +aot_executor.set_input("b", tvm.nd.array(np.ndarray([3, 5], dtype="uint8"))) +aot_exec.run() +output = aot_exec.get_output(0) +assert output.asnumpy() == np.ndarray([5, 7], dtype="uint8") +``` + +[1] NOTE: The target string is not the final place this customization should be made. However, it's +been the place where we've been putting runtime-related stuff. A separate RFC will split the Target +string into Target options (which affect tuning) and runtime options. + +# **Reference-level explanation** + +Already committed to TVM is the AotExecutorCodegen. This module produces a TIR top-level function +which invokes the Relay operators (implemented in TIR) in a correct order. An example is given +below: + +```bash +PrimFunc([input1, input2, output]) attrs={"global_symbol": "tvmgen_my_mod_run_model", "runner_function": (bool)1} { + // attr [(nullptr)] device_id = 0 + // attr [(nullptr)] device_type = 1 + tir.tvm_call_packed("tvmgen_my_mod_fused_add", input1, input2, output) +} +``` + +The AotExecutor then needs to accomplish the following to meet Module-based Model Runtime Interface: + +1. Allocate input and output tensors as defined in the `run_model` function using the correct Device + API. +2. Provide a mapping from relay parameter name to positional argument. +3. Invoke the generated TIR function and provide profiling. + +### Compiler ↔ Runtime Metadata + +In order to implement (1) and (2) above, additional metadata about the `run_model` function needs to +be communicated from Compiler to Runtime: + +- The mapping between Relay parameter name and TIR argument position +- The number of inputs and outputs +- The type of each parameter +- Information sufficient to choose a Device API to allocate memory for that data. + +At present, Metadata is passed from Compiler to Runtime in several different ways: + +1. Constant DLTensor can be bundled with code and supplied to `runtime::Module` via + `runtime::MetadataModule` +2. Many non-DSO-exportable backends (`cuda`, `hexagon`, `metal`, `opencl`, `sdaccel`, `rocm`, + `vulkan`) have adopted the convention of including a + [1runtime::FunctionInfo`](https://github.com/apache/tvm/blob/main/src/runtime/meta_data.h#L106) + (NOTE: distinct from `tvm::relay::transform::FunctionInfo`) in their serialization: + + ```bash + /*! \brief function information needed by device */ + struct FunctionInfo { + std::string name; + std::vector<DLDataType> arg_types; + std::vector<std::string> launch_param_tags; + } + ``` + +3. AotExecutorCodegen and GraphExecutorCodegen have adopted the practice of producing the + graph-level + [`runtime::MetadataNode`](https://github.com/apache/tvm/blob/main/src/runtime/meta_data.h#L55): + + ```bash + /*! + * \brief Structure that can be optionally used by the executor codegen + */ + class MetadataNode : public Object { + public: + /*! \brief input information for the main function */ + Array<String> inputs; + /*! \brief number of outputs of the main function */ + int num_outputs = 1; + /*! \brief the executor to be used to run the model */ + String executor = kTvmExecutorGraph; + + String mod_name = ""; + } + ``` + +4. The recent AOTExecutor implementation has created `tvm::relay::transform::FunctionInfo` which + communicates statistics about memory usage and I/O operation for each TIR operator and aggregate + statistics for the top-level AOT function: + + ```bash + struct FunctionInfoNode : public Object { + Map<Target, Integer> workspace_sizes; + Map<Target, Integer> io_sizes; + Map<Target, Integer> constant_sizes; + Map<Target, tir::PrimFunc> tir_primfuncs; + Map<Target, Function> relay_primfuncs; + } + ``` + + +Some duplication of information is already present. Likely this is due in part to the existing +middle-end compiler design, in which a separate `IRModule` is produced for each backend. Another Review comment: I see--so this is more about the wording a couple paragraphs down, no? > Work is currently ongoing to unify the pre-codegen IRModule into a single instance. After this work is completed, it will be much easier to produce a centralized module-level Metadata. cc @jroesch i am actually not sure if there is an RFC describing this. i'm hoping to describe my ambitions to "link" metadata into TIR via `tir.load_metadata` node in a following RFC (and this is really what requires this consolidation). ########## File path: rfcs/0046-module-based-model-runtime-for-aot.md ########## @@ -0,0 +1,348 @@ +# Module-based Model Runtime Interface for AOT + +- Feature Name: module_based_model_runtime_for_aot +- Start Date: 2021-09-17 +- RFC PR: [apache/tvm-rfcs#0046](https://github.com/apache/tvm-rfcs/pull/0046) +- GitHub Issue: [apache/tvm#0000](https://github.com/apache/tvm/issues/0000) + +# **Summary** + +This RFC describes a [Module-based Model Runtime +interface](https://discuss.tvm.apache.org/t/discuss-module-based-model-runtime-interface/5025) for +the [Ahead-of-Time Executor](https://discuss.tvm.apache.org/t/implementing-aot-in-tvm/9206), thereby +enabling its use from the TVM C++ Runtime. + +# **Motivation** + +The microTVM project has made significant progress towards an Ahead-of-Time Executor for compiled +Relay models. At the time of writing, it's now possible to codegen a TIR function which executes +Relay models that have known shapes, don't have graph-level control flow, and execute only on the +CPU device. Right now, the C runtime is the only such runtime environment which can interact with +this generated code. However, significant interest exists in enabling the C++ runtime to use the +Ahead-of-Time executor. + +# **Guide-level explanation** + +Users select the AOT executor at compile time through the traditional GraphExecutor compilation flow +(e.g. `[tvm.relay.build](http://tvm.relay.build)`) by including `--executor=aot` in the Target +[1]. The return value of `tvm.relay.build` in this case is an `AotExecutorFactory` Module +object. Users instantiate the AOT executor via `AotExecutorFactory` as they do with `GraphExecutor`: + +```bash +ir_mod = tvm.parser.fromtext("""\ + #[version = "0.0.5"] + def @main(%a : Tensor[(1, 2), uint8], %b : Tensor[(1, 2), uint8]) { + %0 = %a + %b; + %0 + }""" + ) + +with PassConfig(opt_level=3): + factory : AotExecutorFactory = tvm.relay.build( + ir_mod, "llvm -executor=aot", module_name="my_mod") + +aot_executor : AotExecutor = factory["my_mod"](tvm.cpu(0)) +``` + +`AotExecutor` supports the traditional Module-Based Model Runtime Interface and can be used as a +user normally would `GraphExecutor`: + +```bash +aot_executor.set_input("a", tvm.nd.array(np.ndarray([1, 2], dtype="uint8"))) +aot_executor.set_input("b", tvm.nd.array(np.ndarray([3, 5], dtype="uint8"))) +aot_exec.run() +output = aot_exec.get_output(0) +assert output.asnumpy() == np.ndarray([5, 7], dtype="uint8") +``` + +[1] NOTE: The target string is not the final place this customization should be made. However, it's +been the place where we've been putting runtime-related stuff. A separate RFC will split the Target +string into Target options (which affect tuning) and runtime options. + +# **Reference-level explanation** + +Already committed to TVM is the AotExecutorCodegen. This module produces a TIR top-level function +which invokes the Relay operators (implemented in TIR) in a correct order. An example is given +below: + +```bash +PrimFunc([input1, input2, output]) attrs={"global_symbol": "tvmgen_my_mod_run_model", "runner_function": (bool)1} { + // attr [(nullptr)] device_id = 0 + // attr [(nullptr)] device_type = 1 + tir.tvm_call_packed("tvmgen_my_mod_fused_add", input1, input2, output) +} +``` + +The AotExecutor then needs to accomplish the following to meet Module-based Model Runtime Interface: + +1. Allocate input and output tensors as defined in the `run_model` function using the correct Device + API. +2. Provide a mapping from relay parameter name to positional argument. +3. Invoke the generated TIR function and provide profiling. + +### Compiler ↔ Runtime Metadata + +In order to implement (1) and (2) above, additional metadata about the `run_model` function needs to +be communicated from Compiler to Runtime: + +- The mapping between Relay parameter name and TIR argument position +- The number of inputs and outputs +- The type of each parameter +- Information sufficient to choose a Device API to allocate memory for that data. + +At present, Metadata is passed from Compiler to Runtime in several different ways: + +1. Constant DLTensor can be bundled with code and supplied to `runtime::Module` via + `runtime::MetadataModule` +2. Many non-DSO-exportable backends (`cuda`, `hexagon`, `metal`, `opencl`, `sdaccel`, `rocm`, + `vulkan`) have adopted the convention of including a + [1runtime::FunctionInfo`](https://github.com/apache/tvm/blob/main/src/runtime/meta_data.h#L106) + (NOTE: distinct from `tvm::relay::transform::FunctionInfo`) in their serialization: + + ```bash + /*! \brief function information needed by device */ + struct FunctionInfo { + std::string name; + std::vector<DLDataType> arg_types; + std::vector<std::string> launch_param_tags; + } + ``` + +3. AotExecutorCodegen and GraphExecutorCodegen have adopted the practice of producing the + graph-level + [`runtime::MetadataNode`](https://github.com/apache/tvm/blob/main/src/runtime/meta_data.h#L55): + + ```bash + /*! + * \brief Structure that can be optionally used by the executor codegen + */ + class MetadataNode : public Object { + public: + /*! \brief input information for the main function */ + Array<String> inputs; + /*! \brief number of outputs of the main function */ + int num_outputs = 1; + /*! \brief the executor to be used to run the model */ + String executor = kTvmExecutorGraph; + + String mod_name = ""; + } + ``` + +4. The recent AOTExecutor implementation has created `tvm::relay::transform::FunctionInfo` which + communicates statistics about memory usage and I/O operation for each TIR operator and aggregate + statistics for the top-level AOT function: + + ```bash + struct FunctionInfoNode : public Object { + Map<Target, Integer> workspace_sizes; + Map<Target, Integer> io_sizes; + Map<Target, Integer> constant_sizes; + Map<Target, tir::PrimFunc> tir_primfuncs; + Map<Target, Function> relay_primfuncs; + } + ``` + + +Some duplication of information is already present. Likely this is due in part to the existing +middle-end compiler design, in which a separate `IRModule` is produced for each backend. Another Review comment: I see--so this is more about the wording a couple paragraphs down, no? > Work is currently ongoing to unify the pre-codegen IRModule into a single instance. After this work is completed, it will be much easier to produce a centralized module-level Metadata. cc @jroesch i am actually not sure if there is an RFC describing this. i'm hoping to describe my ambitions to "link" metadata into TIR via `tir.load_metadata` node in a following RFC, and I definitely would need consolidated metadata for this at the IRModule level. I'm not sure if there is anything in code-generation that strictly requires this--it's just cleaner in my book. Let me know if I'm missing anything here. ########## File path: rfcs/0046-module-based-model-runtime-for-aot.md ########## @@ -0,0 +1,348 @@ +# Module-based Model Runtime Interface for AOT + +- Feature Name: module_based_model_runtime_for_aot +- Start Date: 2021-09-17 +- RFC PR: [apache/tvm-rfcs#0046](https://github.com/apache/tvm-rfcs/pull/0046) +- GitHub Issue: [apache/tvm#0000](https://github.com/apache/tvm/issues/0000) + +# **Summary** + +This RFC describes a [Module-based Model Runtime +interface](https://discuss.tvm.apache.org/t/discuss-module-based-model-runtime-interface/5025) for +the [Ahead-of-Time Executor](https://discuss.tvm.apache.org/t/implementing-aot-in-tvm/9206), thereby +enabling its use from the TVM C++ Runtime. + +# **Motivation** + +The microTVM project has made significant progress towards an Ahead-of-Time Executor for compiled +Relay models. At the time of writing, it's now possible to codegen a TIR function which executes +Relay models that have known shapes, don't have graph-level control flow, and execute only on the +CPU device. Right now, the C runtime is the only such runtime environment which can interact with +this generated code. However, significant interest exists in enabling the C++ runtime to use the +Ahead-of-Time executor. + +# **Guide-level explanation** + +Users select the AOT executor at compile time through the traditional GraphExecutor compilation flow +(e.g. `[tvm.relay.build](http://tvm.relay.build)`) by including `--executor=aot` in the Target +[1]. The return value of `tvm.relay.build` in this case is an `AotExecutorFactory` Module +object. Users instantiate the AOT executor via `AotExecutorFactory` as they do with `GraphExecutor`: + +```bash +ir_mod = tvm.parser.fromtext("""\ + #[version = "0.0.5"] + def @main(%a : Tensor[(1, 2), uint8], %b : Tensor[(1, 2), uint8]) { + %0 = %a + %b; + %0 + }""" + ) + +with PassConfig(opt_level=3): + factory : AotExecutorFactory = tvm.relay.build( + ir_mod, "llvm -executor=aot", module_name="my_mod") + +aot_executor : AotExecutor = factory["my_mod"](tvm.cpu(0)) +``` + +`AotExecutor` supports the traditional Module-Based Model Runtime Interface and can be used as a +user normally would `GraphExecutor`: + +```bash +aot_executor.set_input("a", tvm.nd.array(np.ndarray([1, 2], dtype="uint8"))) +aot_executor.set_input("b", tvm.nd.array(np.ndarray([3, 5], dtype="uint8"))) +aot_exec.run() +output = aot_exec.get_output(0) +assert output.asnumpy() == np.ndarray([5, 7], dtype="uint8") +``` + +[1] NOTE: The target string is not the final place this customization should be made. However, it's +been the place where we've been putting runtime-related stuff. A separate RFC will split the Target +string into Target options (which affect tuning) and runtime options. + +# **Reference-level explanation** + +Already committed to TVM is the AotExecutorCodegen. This module produces a TIR top-level function +which invokes the Relay operators (implemented in TIR) in a correct order. An example is given +below: + +```bash +PrimFunc([input1, input2, output]) attrs={"global_symbol": "tvmgen_my_mod_run_model", "runner_function": (bool)1} { + // attr [(nullptr)] device_id = 0 + // attr [(nullptr)] device_type = 1 + tir.tvm_call_packed("tvmgen_my_mod_fused_add", input1, input2, output) +} +``` + +The AotExecutor then needs to accomplish the following to meet Module-based Model Runtime Interface: + +1. Allocate input and output tensors as defined in the `run_model` function using the correct Device + API. +2. Provide a mapping from relay parameter name to positional argument. +3. Invoke the generated TIR function and provide profiling. + +### Compiler ↔ Runtime Metadata + +In order to implement (1) and (2) above, additional metadata about the `run_model` function needs to +be communicated from Compiler to Runtime: + +- The mapping between Relay parameter name and TIR argument position +- The number of inputs and outputs +- The type of each parameter +- Information sufficient to choose a Device API to allocate memory for that data. + +At present, Metadata is passed from Compiler to Runtime in several different ways: + +1. Constant DLTensor can be bundled with code and supplied to `runtime::Module` via + `runtime::MetadataModule` +2. Many non-DSO-exportable backends (`cuda`, `hexagon`, `metal`, `opencl`, `sdaccel`, `rocm`, + `vulkan`) have adopted the convention of including a + [1runtime::FunctionInfo`](https://github.com/apache/tvm/blob/main/src/runtime/meta_data.h#L106) + (NOTE: distinct from `tvm::relay::transform::FunctionInfo`) in their serialization: + + ```bash + /*! \brief function information needed by device */ + struct FunctionInfo { + std::string name; + std::vector<DLDataType> arg_types; + std::vector<std::string> launch_param_tags; + } + ``` + +3. AotExecutorCodegen and GraphExecutorCodegen have adopted the practice of producing the + graph-level + [`runtime::MetadataNode`](https://github.com/apache/tvm/blob/main/src/runtime/meta_data.h#L55): + + ```bash + /*! + * \brief Structure that can be optionally used by the executor codegen + */ + class MetadataNode : public Object { + public: + /*! \brief input information for the main function */ + Array<String> inputs; + /*! \brief number of outputs of the main function */ + int num_outputs = 1; + /*! \brief the executor to be used to run the model */ + String executor = kTvmExecutorGraph; + + String mod_name = ""; + } + ``` + +4. The recent AOTExecutor implementation has created `tvm::relay::transform::FunctionInfo` which + communicates statistics about memory usage and I/O operation for each TIR operator and aggregate + statistics for the top-level AOT function: + + ```bash + struct FunctionInfoNode : public Object { + Map<Target, Integer> workspace_sizes; + Map<Target, Integer> io_sizes; + Map<Target, Integer> constant_sizes; + Map<Target, tir::PrimFunc> tir_primfuncs; + Map<Target, Function> relay_primfuncs; + } + ``` + + +Some duplication of information is already present. Likely this is due in part to the existing +middle-end compiler design, in which a separate `IRModule` is produced for each backend. Another Review comment: per our offline discussion, clarified how the metadata is carried through the current compiler design, removed references to un-RFC'd design efforts and replaced with text to motivate them. also clarified some wording--ptal -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
