mbs-octoml commented on a change in pull request #38: URL: https://github.com/apache/tvm-rfcs/pull/38#discussion_r734004042
########## File path: rfcs/00xx-improved-multi-target-handling.md ########## @@ -0,0 +1,176 @@ +- Feature Name: improved-multi-target-handling +- Start Date: 2021-09-20 +- RFC PR: [apache/tvm-rfcs#0000](https://github.com/apache/tvm-rfcs/pull/0000) +- GitHub Issue: [apache/tvm#0000](https://github.com/apache/tvm/issues/0000) + +# Summary +[summary]: #summary + +TVM supports 'hetrogeneous' execution, whereby primitive operators may be (sequentially) evaluated on more than +one device (GPU, CPU, accelerator, etc). For the non-BYOC flow this works as follows: +1. Relay programs may contain "on_device" annotations which specify that a sub-expressions's result should + reside on a device with a given `DLDeviceType` (kDLCPU, kDLCUDA, etc). +2. The device planning pass uses those annotations to decide on the unique device for every Relay sub-expression, + including every primitive operator call. Sub-expressions which are unconstrained are assigned to the 'default' + device. The pass then inserts "device_copy" operators whenever tensors need to cross device boundaries. +3. The user/driver must also supply a list of `Target` objects. The compiler uses that list to build a `TargetMap` + from `DLDeviceType` to `Target` for all of those objects. +4. Each call to a primitive operator for a particular `DLDeviceType` signals we need to compile ('lower') that + primitive for that device. The `Target` to use for that compilation is found from the `TargetMap`. + +This approach has 5 problems: +1. TVM is being targeted to environments with multiple CPUs (eg Arm 'Big.LITTLE') and multiple tensor-friendly + devices (eg a GPU as well as an accelerator such as Arm 'Ethos-U'). This means a `DLDeviceType` no longer + uniquely determines a `Target`. +2. Though TVM's `Device` abstraction (an alias for `dlpack`'s `DLDevice`) is a pair of a `DLDeviceType` and an + arbitrary 'device id', TVM does not consistently plumb the device id through annotations, passes and operators. + Thus currently we cannot use 'device id' to distinguish, eg, two CPUs in the same system. +3. The codebase still uses an older `target` and `target_host` convention for distinguishing the main `Target` for + primitive operators from the `Target` for residual tensor computation, shape computation, and (for AOT) the + overall Relay control-flow. There's a fair bit of 'target normalization' scattered throughout the codebase to + deal with these different conventions. +4. `Target`s are often manufactured on-the-fly (eg to represent the default 'CPU' target on which shape computations + should be hosted). However there's no guarantee those default `Target`s will match up with the user-supplied + `Target`s, thus it's possible to end up with `"llvm"` and `"llvm -m ..."` `Targets` coexisting. Now that + `IRModule` uses `Target` objects themselves to distinguish which `PrimFunc`s are intended for which targets, + it is particularly important to ensure there's a single source of truth for available `Target`s. +5. TVM also supports a 'BYOC' extension mechanism. This allows `"target.<target name>"` annotations to be placed on + primitive operations to indicate they should possibly be compiled with the matching BYOC toolchain. A target + annotation pass uses those annotations to decide on a target name for every Relay sub-expression. A partition graph + pass then inserts function call boundaries whenever execution needs to cross target boundaries. However this + machinery is separate from and incompatible with the "on_device" mechanism, and 'target names' are a separate + concept from `Target` objects. + +In this RFC we tackle problems 1-4. We won't directly take on 5 since it involves more moving parts, but our hope +is for this RFC to clear the way to taking on 5 in the future. + +Our proposal is: +1. Extend `Target` to have a `DLDeviceType` attribute. Review comment: So I've rejigged to take on 5 and put less emphasis on the target wrangling which I think will work its self out by a combination of @Mousius work and incremental cleanup. Deep breath. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
