mbs-octoml commented on a change in pull request #38:
URL: https://github.com/apache/tvm-rfcs/pull/38#discussion_r734003087



##########
File path: rfcs/00xx-improved-multi-target-handling.md
##########
@@ -0,0 +1,176 @@
+- Feature Name: improved-multi-target-handling
+- Start Date: 2021-09-20
+- RFC PR: [apache/tvm-rfcs#0000](https://github.com/apache/tvm-rfcs/pull/0000)
+- GitHub Issue: [apache/tvm#0000](https://github.com/apache/tvm/issues/0000)
+
+# Summary
+[summary]: #summary
+
+TVM supports 'hetrogeneous' execution, whereby primitive operators may be 
(sequentially) evaluated on more than
+one device (GPU, CPU, accelerator, etc). For the non-BYOC flow this works as 
follows:
+1. Relay programs may contain "on_device" annotations which specify that a 
sub-expressions's result should
+   reside on a device with a given `DLDeviceType` (kDLCPU, kDLCUDA, etc).
+2. The device planning pass uses those annotations to decide on the unique 
device for every Relay sub-expression,
+   including every primitive operator call. Sub-expressions which are 
unconstrained are assigned to the 'default'
+   device. The pass then inserts "device_copy" operators whenever tensors need 
to cross device boundaries.
+3. The user/driver must also supply a list of `Target` objects. The compiler 
uses that list to build a `TargetMap`
+   from `DLDeviceType` to `Target` for all of those objects.
+4. Each call to a primitive operator for a particular `DLDeviceType` signals 
we need to compile ('lower') that
+   primitive for that device. The `Target` to use for that compilation is 
found from the `TargetMap`.
+
+This approach has 5 problems:

Review comment:
       Now 6. Progress!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to