mbs-octoml commented on a change in pull request #38:
URL: https://github.com/apache/tvm-rfcs/pull/38#discussion_r734031103



##########
File path: rfcs/00xx-improved-multi-target-handling.md
##########
@@ -0,0 +1,176 @@
+- Feature Name: improved-multi-target-handling
+- Start Date: 2021-09-20
+- RFC PR: [apache/tvm-rfcs#0000](https://github.com/apache/tvm-rfcs/pull/0000)
+- GitHub Issue: [apache/tvm#0000](https://github.com/apache/tvm/issues/0000)
+
+# Summary
+[summary]: #summary
+
+TVM supports 'hetrogeneous' execution, whereby primitive operators may be 
(sequentially) evaluated on more than
+one device (GPU, CPU, accelerator, etc). For the non-BYOC flow this works as 
follows:
+1. Relay programs may contain "on_device" annotations which specify that a 
sub-expressions's result should
+   reside on a device with a given `DLDeviceType` (kDLCPU, kDLCUDA, etc).
+2. The device planning pass uses those annotations to decide on the unique 
device for every Relay sub-expression,
+   including every primitive operator call. Sub-expressions which are 
unconstrained are assigned to the 'default'
+   device. The pass then inserts "device_copy" operators whenever tensors need 
to cross device boundaries.
+3. The user/driver must also supply a list of `Target` objects. The compiler 
uses that list to build a `TargetMap`
+   from `DLDeviceType` to `Target` for all of those objects.
+4. Each call to a primitive operator for a particular `DLDeviceType` signals 
we need to compile ('lower') that
+   primitive for that device. The `Target` to use for that compilation is 
found from the `TargetMap`.
+
+This approach has 5 problems:
+1. TVM is being targeted to environments with multiple CPUs (eg Arm 
'Big.LITTLE') and multiple tensor-friendly
+   devices (eg a GPU as well as an accelerator such as Arm 'Ethos-U'). This 
means a `DLDeviceType` no longer
+   uniquely determines a `Target`.
+2. Though TVM's `Device` abstraction (an alias for `dlpack`'s `DLDevice`) is a 
pair of a `DLDeviceType` and an
+   arbitrary 'device id', TVM does not consistently plumb the device id 
through annotations, passes and operators.
+   Thus currently we cannot use 'device id' to distinguish, eg, two CPUs in 
the same system.
+3. The codebase still uses an older `target` and `target_host` convention for 
distinguishing the main `Target` for
+   primitive operators from the `Target` for residual tensor computation, 
shape computation, and (for AOT) the
+   overall Relay control-flow. There's a fair bit of 'target normalization' 
scattered throughout the codebase to
+   deal with these different conventions.
+4. `Target`s are often manufactured on-the-fly (eg to represent the default 
'CPU' target on which shape computations
+   should be hosted). However there's no guarantee those default `Target`s 
will match up with the user-supplied
+   `Target`s, thus it's possible to end up with `"llvm"` and `"llvm -m ..."` 
`Targets` coexisting. Now that
+   `IRModule` uses `Target` objects themselves to distinguish which 
`PrimFunc`s are intended for which targets,
+   it is particularly important to ensure there's a single source of truth for 
available `Target`s.
+5. TVM also supports a 'BYOC' extension mechanism. This allows 
`"target.<target name>"` annotations to be placed on
+   primitive operations to indicate they should possibly be compiled with the 
matching BYOC toolchain. A target
+   annotation pass uses those annotations to decide on a target name for every 
Relay sub-expression. A partition graph
+   pass then inserts function call boundaries whenever execution needs to 
cross target boundaries. However this
+   machinery is separate from and incompatible with the "on_device" mechanism, 
and 'target names' are a separate
+   concept from `Target` objects.
+
+In this RFC we tackle problems 1-4. We won't directly take on 5 since it 
involves more moving parts, but our hope
+is for this RFC to clear the way to taking on 5 in the future.
+
+Our proposal is:
+1. Extend `Target` to have a `DLDeviceType` attribute.
+2. Allow `Target` objects to be registered under a globally unique target 
label. Registration may be 'static' (ie

Review comment:
       So for 'static' we already have the big-list-o-nvidia-targets (see 
target/tag.cc). I could imagine that's a pattern we keep building on. But as I 
started implementing this I quickly realized there's no need to introduce yet 
another name-to-thing binding mechanism for targets/devices etc.
    -  We already have the tag, so you can write Target("nividia/tesla-k40") 
and get that pre-defined target.
    - If you want to programatically construct on_device annotations or 
anything else that needs to refer to specific targets, well, just use the host 
language to pass them around.
    - If you want to write Relay script containing on_device annotations, use 
the metatable mechanism to bind to specific targets etc and refer to them by 
name in the text.
   
   I've zapped all the target labels stuff from the RFC.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to