mbs-octoml commented on a change in pull request #10:
URL: https://github.com/apache/tvm-rfcs/pull/10#discussion_r694978737



##########
File path: rfcs/0010-additional-target-hooks.md
##########
@@ -0,0 +1,155 @@
+Feature Name: additional-target-hooks
+Start Date: 2021-07-14
+RFC PR: apache/tvm-rfcs#10
+GitHub Issue: apache/tvm#8589
+
+# Summary
+[summary]: #summary
+
+In order to enable flexibility in how individual targets are lowered and built 
within TVM, this RFC proposes supporting additional hooks on the `Target` and 
that the target becomes the central place for such hooks, for example:
+
+```
+TVM_REGISTER_TARGET_KIND("cmsisnn", kDLCPU)
+    .set_attr<String>("relay_to_tir", "target.cmsisnn.lower")
+    .set_attr<String>("tir_to_runtime", "target.cmsisnn.build");
+```
+
+This defines two new hooks as attributes on the target, referencing functions 
registered into the central TVM registry. In similar fashion, external 
generators (currently accessed directly in the compile engine) would be grouped 
with an appropriate `Target` as well:
+
+```
+TVM_REGISTER_TARGET_KIND("ethos-n", kDLCPU)
+    .set_attr<String>("relay_to_runtime", "relay.ext.ethos-n")
+    .set_attr<String>("constant_updater", 
"relay.ext.ethos-n.constant_updater");
+```
+
+Collecting all targets under the `Target` functionality and making it clearer 
which hooks apply to each target.
+
+# Motivation
+[motivation]: #motivation
+
+Currently to introduce an external code generator, the entire compilation 
pipeline must be recreated; this is necessary for some targets but in the case 
of simply re-using existing libraries or introducing a function call to use for 
an operator it can become more than is necessary. It also exists outside of the 
main `PrimFunc`, meaning it can't be inspected as part of the entire main 
graph; this limits the effectiveness of techniques such as memory planning. By 
introducing the hook `relay_to_tir`, which is similar to the default `lower` 
pass in that it returns TIR, it can be inspected by the memory planner and 
other analysis passes that only work at the TIR level. If all that is necessary 
is transforming into a flat `call_extern` (such is the case for the [CMSIS NN 
Softmax 
function](https://github.com/ARM-software/CMSIS_5/blob/develop/CMSIS/NN/Source/SoftmaxFunctions/arm_softmax_s8.c#L86))
 then this can be left represented as TIR and be collected by the host code 
generation.
+
+In the more complex case, we still want to take advantage of memory planning 
by using `relay_to_tir` and inspecting the liveness within the TIR graph, but 
instead want to generate out more complex calls (such as using the [CMSIS NN 
Structures](https://github.com/ARM-software/CMSIS_5/blob/def6f800f95661eb3451d317f7d0dde504f6020d/CMSIS/NN/Include/arm_nn_types.h#L81-L90));
 the `tir_to_runtime` hook can be used to build our intermediary TIR into a 
Runtime module similarly to how the existing external code generation works. 
This allows writing of external code generators that also get the benefits of 
any intermediary analysis or transformation that TVM offers. Alongside being 
able to use the analysis passes, code generators can extend from existing host 
code generators, customising only the generation which is relevant to them and 
gaining maximum benefit from the existing optimisations made in TVM.
+
+# Guide-level explanation
+[guide-level-explanation]: #guide-level-explanation
+
+As a user, you can pick from additional hooks to bypass certain behaviours of 
the `Target`:
+* `relay_to_tir` - Custom lowering direct to TIR
+* `tir_to_runtime` - Custom code generation into a runtime module from TIR
+* `relay_to_runtime` - Full compilation flow from Relay to a runtime module
+
+To illustrate where the hooks are placed, please refer to the following 
diagram:
+
+![Diagram showing the splitting point of relay_to_runtime, relay_to_tir and 
tir_to_runtime](./assets/000x/bypass.png)
+
+These can be registered on targets using `set_attr`:
+```
+TVM_REGISTER_TARGET_KIND("cmsisnn", kDLCPU)
+    .set_attr<String>("relay_to_tir", "target.cmsisnn.lower")
+    .set_attr<String>("tir_to_runtime", "target.cmsisnn.build");
+
+TVM_REGISTER_TARGET_KIND("ethos-n", kDLCPU)
+    .set_attr<String>("relay_to_runtime", "relay.ext.ethos-n")
+    .set_attr<String>("constant_updater", 
"relay.ext.ethos-n.constant_updater");
+```
+
+## Relay -> TIR
+With this change, this path splits, depending on whether you wanted to 
generate a full `Module` or introduce some specific TIR nodes into the code 
generation flow; the addition of the `relay_to_tir` hook allows you to write 
trivial external TIR generators such as calling out to a third party library:
+```python
[email protected]_func("target.woofles.lowering")
+def tir_generator(ir_module, relay_func):
+    """A simple TIR generator for testing"""
+    ib = tvm.tir.ir_builder.create()
+    A = tvm.tir.decl_buffer(shape=(8,8,), 
dtype=relay_func.params[0].checked_type.dtype)
+    B = tvm.tir.decl_buffer(shape=(8,8,), 
dtype=relay_func.params[0].checked_type.dtype)
+    C = tvm.tir.decl_buffer(shape=(8,8,), 
dtype=relay_func.params[0].checked_type.dtype)
+    ib.emit(
+        tvm.tir.call_extern('int32', 'woofles', A.data, B.data, 8, 8, C.data)
+    )
+
+    prim_func = tvm.tir.PrimFunc([A, B, C], ib.get())
+    new_module = tvm.lower(prim_func, name=relay_func.attrs["global_symbol"])
+
+    return new_module, GlobalVar(relay_func.attrs["global_symbol"])
+```
+This is then registered on a target:
+```
+TVM_REGISTER_TARGET_KIND("woofles", kDLCPU)
+    .set_attr<String>("relay_to_tir", "target.woofles.lowering");
+```
+The signature for this hook is as follows:
+```
+relay_to_tir(const IRModule& ir_module, const relay::Function& function) -> 
(IRModule, GlobalVar)
+```
+Which takes a read only `IRModule` and relevant `Function` and returns a new 
`IRModule` which represents the transformed function, alongside a `GlobalVar` 
which indicates the top-level operator function within that new `IRModule`.
+
+## TIR -> Runtime
+Extending from the above, a second hook is introduced to do further 
transformations from TIR -> Runtime named `tir_to_runtime`, this bypasses the 
default `target.build.X` and instead uses the registered `tir_to_runtime` build:
+```
+runtime::Module BuildWooflesHost(IRModule mod, Target target) {
+// ... Custom Code generation here
+}
+
+TVM_REGISTER_GLOBAL("target.build.woofles").set_body_typed(BuildWooflesHost);
+TVM_REGISTER_TARGET_KIND("woofles", kDLCPU)
+    .set_attr<String>("tir_to_runtime", "target.build.woofles");
+```
+
+# Reference-level explanation
+[reference-level-explanation]: #reference-level-explanation
+
+This functionality is an extension of the existing use of `attr::kCompiler` to 
provide a hint that we can use to lookup attached target attribute, the compile 
engine and code generation flows can choose to store TIR and/or generate 
runtime modules based on the registered hooks.
+
+## Relay to TIR Hook
+[relay-to-tir-hook]: #relay-to-tir-hook
+
+This can be added to the TE Compiler by cross referencing the existing 
`attr::kCompiler` with the `TargetKind` registry:

Review comment:
       That sounds sensible. I need to pin Junru down for an hour or so to get 
a tour of AutoTVM to understand how subtle the extension points would need to 
be.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to