This is an automated email from the ASF dual-hosted git repository.

areusch pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm-rfcs.git


The following commit(s) were added to refs/heads/main by this push:
     new a6f9a25  Additional Target Hooks RFC (#10)
a6f9a25 is described below

commit a6f9a25d68ffa8c7e3de958cf1c309ef7bc62f48
Author: Christopher Sidebottom <[email protected]>
AuthorDate: Mon Sep 27 21:20:11 2021 +0100

    Additional Target Hooks RFC (#10)
    
    * Additional Target Hooks RFC
    
    This is the an initial RFC for adding additional hooks onto the `Target` to
    allow splitting up some of the compile flow but also unifying the
    registration of these additional functions.
    
    * Update signatures and integration points for Target Hooks
    
    * Update hooks definition and lowering process
    
    Change-Id: I578145c37f9a10b4c15ed64fa86d6d7c2fade04e
    
    * Re-draw API design for RelayToTIR and RelayToRuntime
    
    Based on the discussions around these hooks, we now have a better idea
    of how to introduce them into the codebase.
    
    * Provide further clarification on hooks and minor text fixes
---
 ...arget-registered-compiler-flow-customisation.md | 196 +++++++++++++++++++++
 rfcs/assets/000x/bypass.png                        | Bin 0 -> 59627 bytes
 2 files changed, 196 insertions(+)

diff --git a/rfcs/0010-target-registered-compiler-flow-customisation.md 
b/rfcs/0010-target-registered-compiler-flow-customisation.md
new file mode 100644
index 0000000..65fc17b
--- /dev/null
+++ b/rfcs/0010-target-registered-compiler-flow-customisation.md
@@ -0,0 +1,196 @@
+- Feature Name: `Target` registered compiler flow customisation
+- Start Date: 2021-07-14
+- RFC PR: https://github.com/apache/tvm-rfcs/pull/10
+- GitHub Issue: https://github.com/apache/tvm/issues/8589
+
+# Summary
+[summary]: #summary
+
+In order to enable flexibility in how individual targets are lowered and built 
within TVM, this RFC proposes additional hooks on the `Target` and that the 
target becomes the central place for such hooks, for example:
+
+```c++
+using FTVMRelayToTIR = Pass;
+using FTVMTIRToRuntime = runtime::TypedPackedFunc<runtime::Module(IRModule, 
Target)>;
+
+TVM_REGISTER_TARGET_KIND("cmsisnn", kDLCPU)
+    .set_attr<FTVMRelayToTIR>("RelayToTIR", CMSISNNLowering)
+    .set_attr<FTVMTIRToRuntime>("TIRToRuntime", CMSISNNCodeGen);
+```
+
+This defines two new hooks as attributes on the target, referencing functions 
registered into the central TVM registry. In similar fashion, external code 
generators (registered under the `relay.ext.` namespace currently) would be 
grouped with an appropriate `Target` as well:
+
+```c++
+using FTVMRelayToRuntime = runtime::TypedPackedFunc<runtime::Module(const 
Function&)>;
+using FTVMConstantUpdater = runtime::TypedPackedFunc<Map<String, 
runtime::NDArray>(Expr, std::string)>;
+
+TVM_REGISTER_TARGET_KIND("ethos-n", kDLCPU)
+    .set_attr<FTVMRelayToRuntime>("RelayToRuntime", EthosNCodeGen)
+    .set_attr<FTVMConstantUpdater>("UpdateConstants", EthosNConstantUpdater);
+```
+
+Collecting all targets under the `Target` functionality (as opposed to 
registering additional `Target`s through the function registry using the 
namespace `relay.ext.`) and makes it clearer which hooks apply to each target.
+
+# Motivation
+[motivation]: #motivation
+
+We want to make external code generation (otherwise known as 
[BYOC](https://tvm.apache.org/docs/dev/relay_bring_your_own_codegen.html)) more 
modular; instead of going from a Relay `IRModule` to `runtime::Module` in one 
big step, you can break it into phases and make use of existing transformations 
between phases.
+
+Currently to introduce an external code generator, the entire compilation 
pipeline must be recreated; this is necessary for some targets but in the case 
of simply re-using existing libraries or introducing a function call to use for 
an operator it can become more than is necessary; to implement an external code 
generator requires going directly from Relay to a `runtime::Module` and 
re-implementing any compiler passes and code generation functionality rather 
than being able to extend upon [...]
+
+The generated `runtime::Module` also exists outside of the main graph, meaning 
it can't be inspected in combination with other operators; this limits the 
effectiveness of techniques such as memory planning. By introducing the hook 
`RelayToTIR`, which is similar to the default `LowerTEPass` in that it returns 
TIR, it can be inspected by the memory planner and other analysis passes that 
only work at the TIR level. If all that is necessary is transforming into a 
flat `call_extern` (such is  [...]
+
+In the more complex case, we still want to take advantage of memory planning 
by using `RelayToTIR` and inspecting the liveness within the TIR graph, but 
instead want to generate out more complex calls (such as using the [CMSIS NN 
Structures](https://github.com/ARM-software/CMSIS_5/blob/def6f800f95661eb3451d317f7d0dde504f6020d/CMSIS/NN/Include/arm_nn_types.h#L81-L90));
 the `TIRToRuntime` hook can be used to build our intermediary TIR into a 
Runtime module similarly to how the existing ext [...]
+
+# Guide-level explanation
+[guide-level-explanation]: #guide-level-explanation
+
+As a user, you can pick from additional hooks to bypass certain behaviours of 
the `Target`:
+* `RelayToTIR` - Customize the lowering flow to TIR
+* `TIRToRuntime` - Customize code generation into a runtime module from TIR
+* `RelayToRuntime` - Full compilation flow from Relay to a runtime module
+
+To illustrate where the hooks are placed, please refer to the following 
diagram:
+
+![Diagram showing the splitting point of RelayToRuntime, RelayToTIR and 
TIRToRuntime](./assets/000x/bypass.png)
+
+These can be registered on targets using `set_attr`:
+
+```c++
+TVM_REGISTER_TARGET_KIND("cmsisnn", kDLCPU)
+    .set_attr<FTVMRelayToTIR>("RelayToTIR", CMSISNNLowering)
+    .set_attr<FTVMTIRToRuntime>("TIRToRuntime", CMSISNNCodeGen);
+
+TVM_REGISTER_TARGET_KIND("ethos-n", kDLCPU)
+    .set_attr<FTVMRelayToRuntime>("RelayToRuntime", EthosNCodeGen)
+    .set_attr<FTVMConstantUpdater>("UpdateConstants", EthosNConstantUpdater);
+```
+
+## Relay -> TIR
+With this change, this path splits, depending on whether you wanted to 
generate a full `Module` or introduce some specific TIR nodes into the code 
generation flow. The `RelayToTIR` hook is a full `IRModule` `Pass` which 
expects that `Function`s will either be annotated with `kTarget` or `kCompiler` 
as part of a previous `Pass`, and the resultant `IRModule` is also expected to 
have any created `PrimFunc`s annotated. The addition of the `RelayToTIR` hook 
allows you to write trivial externa [...]
+
+```c++
+void CallExternalLibraryInTIR(const GlobalVar& new_global_var, const Function& 
func) {
+    tir::Buffer x_buffer = tir::decl_buffer({8}, DataType::Float(32), "x");
+    tir::Var x_var("x", DataType::Handle());
+
+    Map<String, ObjectRef> dict_attrs;
+    dict_attrs.Set("global_symbol", new_global_var->name_hint);
+    dict_attrs.Set("tir.noalias", Bool(true));
+
+    Map<tir::Var, tir::Buffer> buffer_map = {{x_var, x_buffer}};
+    tir::Stmt body =
+        tir::Evaluate(tvm::tir::Call(DataType::Int(8), 
tir::builtin::call_extern(), {x->data}));
+
+    tir::PrimFunc replacement_func = tir::PrimFunc({x_var}, body, VoidType(),
+                                                    buffer_map, 
DictAttrs(dict_attrs));
+    replacement_func = WithAttr(replacement_func, ::tvm::attr::kTarget, 
host_target_);
+    ir_module_->Add(new_global_var, replacement_func);
+}
+```
+
+This is then registered on a target:
+
+```c++
+TVM_REGISTER_TARGET_KIND("woofles", kDLCPU)
+    .set_attr<FTVMRelayToTIR>("RelayToTIR", 
relay::contrib::woofles::RelayToTIR());
+```
+
+The signature for this hook is as the same as any other `Pass`, which takes an 
`IRModule` with `Function`s and returns an `IRModule` with transformed 
`PrimFunc`s. The registered `RelayToTIR` `Pass` is responsible for both 
establishing the `PrimFunc` definitions (with any caching) and rewriting Relay 
calls to those functions. At this time we feel it's not worth worrying about 
code sharing between different custom passes.
+
+## TIR -> Runtime
+Extending from the above, a second hook is introduced to do further 
transformations from TIR -> Runtime named `TIRToRuntime`, this bypasses the 
default `target.build.X` and instead uses the registered `TIRToRuntime` build:
+
+```c++
+runtime::Module BuildWooflesHost(IRModule mod, Target target) {
+// ... Custom Code generation here
+}
+
+TVM_REGISTER_TARGET_KIND("woofles", kDLCPU)
+    .set_attr<FTVMTIRToRuntime>("TIRToRuntime", BuildWooflesHost);
+```
+
+Notably the generation hook is passed the unified `IRModule` and is 
responsible for plucking the `Target` relevant functions into the eventual 
`runtime::Module`.
+
+# Reference-level explanation
+[reference-level-explanation]: #reference-level-explanation
+
+This functionality is an extension of the existing use of `attr::kCompiler` to 
provide a hint that we can use to lookup attached target attribute, the 
compiler and code generation flows can choose to store TIR and/or generate 
runtime modules based on the registered hooks.
+
+## Relay to TIR Hook
+[relay-to-tir-hook]: #relay-to-tir-hook
+
+This can be added before the `LowerTEPass`, as a `Pass` which iterates over 
`Target`s and transforming the relevant functions which will then be skipped by 
the `Function`-level passes until the `PrimFunc` passes begin:
+
+
+```c++
+for (Target target : targets_) {
+    auto target_kind = target->kind;
+    auto map = tvm::TargetKind::GetAttrMap<FTVMRelayToTIR>("RelayToTIR");
+    if (map.count(target_kind)) {
+        ir_mod = map[target_kind](ir_mod, pass_context);
+    }
+}
+```
+
+By placing this above the `LowerTEPass`, this means any functions which are 
not processed in this way can be processed by the default lowering without 
interfering with `LowerTEPass`. To achieve this initially `kCompiler` would be 
used to carry the relevant target information, but the goal is to ensure all 
`Target`s are visible as `kTarget`.
+
+```c++
+return 
tvm::transform::Sequential({tvm::relay::transform::RelayToTIRTargetHook(), // 
Additional Pass to call RelayToTIR
+                                    
tvm::transform::CreateModulePass(pass_func, 0, "LowerTE", {}),
+                                    InferType()});
+```
+
+## TIR to Runtime Hook
+[tir-to-runtime-hook]: #tir-to-runtime-hook
+It is proposed that this hook is implemented as part of `codegen.cc` as a 
direct override of the code generation:
+
+```c++
+runtime::Module Build(IRModule mod, Target target) {
+  if (transform::PassContext::Current()
+          ->GetConfig<Bool>("tir.disable_assert", Bool(false))
+          .value()) {
+    mod = tir::transform::SkipAssert()(mod);
+  }
+
+  if (target->kind->HasHook("TIRToRuntime")) { // Hooked here for Codegen
+      return target->kind->GetAttr<FTVMTIRToRuntime>("TIRToRuntime")(mod, 
target);
+  }
+
+  // the build function.
+  std::string build_f_name = "target.build." + target->kind->name;
+  const PackedFunc* bf = runtime::Registry::Get(build_f_name);
+  ICHECK(bf != nullptr) << build_f_name << " is not enabled";
+  return (*bf)(mod, target);
+}
+```
+See [Relay to TIR Hook](#relay-to-tir-hook) for how the `TargetKind` registry 
would be used.
+
+## Relay to Runtime Hook
+[relay-to-runtime-hook]: #relay-to-runtime-hook
+This would replace the existing `relay.ext.<target>` lookup in 
`te_compiler.cc` with a `Pass` which runs beforehand, essentially using the 
same logic as [Relay to TIR Hook](#relay-to-tir-hook) to cross reference with 
`kCompiler`.
+
+```c++
+return 
tvm::transform::Sequential({tvm::relay::transform::RelayToTIRTargetHook(),
+                                    
tvm::relay::transform::RelayToRuntimeTargetHook(), // Additional Pass to call 
RelayToRuntime
+                                    
tvm::transform::CreateModulePass(pass_func, 0, "LowerTE", {}),
+                                    InferType()});
+```
+# Drawbacks
+[drawbacks]: #drawbacks
+
+* Different hooks are currently dealt with in quite disparate parts of the 
codebase which are being heavily refactored
+* Introducing custom TIR has the potential to add edge cases to the compiler 
which may uncover new bugs
+
+# Prior art
+[prior-art]: #prior-art
+
+This is all based upon the existing external code generation infrastructure 
within TVM by placing additional hooks in the same areas as existing external 
generation. Instead of replicating this with named functions in the 
`relay.ext.` namespace of the function registry it instead begins to follow the 
pattern outlined as B1 in 
https://discuss.tvm.apache.org/t/target-and-attributes/6013/6 by @tqchen.
+
+# Future possibilities
+[future-possibilities]: #future-possibilities
+
+In future, this approach enables rapid integration of anything that can be 
represented in TIR into the main compilation graph; this simplifies the 
transformation process for a multitude of external libraries.
+
+Alongside this, adding further hooks means external code generation can gain 
benefits from the normal `lower` and `build` flow in TVM. This then expands to 
exposing more granular methods in the driver api to leverage the compiler 
passes in TVM, similar to how they've been exposed in 
https://github.com/apache/tvm/pull/8110 with `lower_primfunc` and 
`lower_schedule`. This can is then regulated by the normal Target mechanism to 
route as appropriate.
+
+Refactoring the target splitting logic into `build_module.cc` alongside any 
external module generation makes this a first class series of hooks into a 
simplified compilation flow; this would enable the removal of external 
generators from executor code generators which currently proxy to 
`te_compiler.cc`. Eventually this could also be used for CPU/GPU split as a 
specialisation of a `Target`/`Target`s split.
diff --git a/rfcs/assets/000x/bypass.png b/rfcs/assets/000x/bypass.png
new file mode 100644
index 0000000..cf725e1
Binary files /dev/null and b/rfcs/assets/000x/bypass.png differ

Reply via email to