mbs-octoml commented on a change in pull request #38: URL: https://github.com/apache/tvm-rfcs/pull/38#discussion_r734003087
########## File path: rfcs/00xx-improved-multi-target-handling.md ########## @@ -0,0 +1,176 @@ +- Feature Name: improved-multi-target-handling +- Start Date: 2021-09-20 +- RFC PR: [apache/tvm-rfcs#0000](https://github.com/apache/tvm-rfcs/pull/0000) +- GitHub Issue: [apache/tvm#0000](https://github.com/apache/tvm/issues/0000) + +# Summary +[summary]: #summary + +TVM supports 'hetrogeneous' execution, whereby primitive operators may be (sequentially) evaluated on more than +one device (GPU, CPU, accelerator, etc). For the non-BYOC flow this works as follows: +1. Relay programs may contain "on_device" annotations which specify that a sub-expressions's result should + reside on a device with a given `DLDeviceType` (kDLCPU, kDLCUDA, etc). +2. The device planning pass uses those annotations to decide on the unique device for every Relay sub-expression, + including every primitive operator call. Sub-expressions which are unconstrained are assigned to the 'default' + device. The pass then inserts "device_copy" operators whenever tensors need to cross device boundaries. +3. The user/driver must also supply a list of `Target` objects. The compiler uses that list to build a `TargetMap` + from `DLDeviceType` to `Target` for all of those objects. +4. Each call to a primitive operator for a particular `DLDeviceType` signals we need to compile ('lower') that + primitive for that device. The `Target` to use for that compilation is found from the `TargetMap`. + +This approach has 5 problems: Review comment: Now 6. Progress! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
