mbs-octoml commented on a change in pull request #62: URL: https://github.com/apache/tvm-rfcs/pull/62#discussion_r835663739
########## File path: rfcs/xxxx-collage.md ########## @@ -0,0 +1,987 @@ +# Design Doc: Collage [Draft 0.8] + +``` +Feature Name: Collage +Start Date: Mar 2022 +Authors: Mark Shields ([email protected]) +RFC PR: <tbd> +GitHub Issue: <tbd> + +History: +- v0.7: First draft. +- v0.8: Rework to emphasise 'partitioning' (quite early in pipeline) instead of 'fusion' (quite late in pipeline). +``` + +This design doc (with accompanying +['v2' prototype implementation](https://github.com/mbs-octoml/mbs-tvm/tree/mbs-collage-sketch)) +shows how to bring tuning to TVM's BYOC partitioning passes. The tuning search explores the choice of sub-graphs (aka ' +partitions') and toolchains (aka 'backends') so as to minimize the expected model inference latency. Both 'graph +style' (eg TensorRT) and 'library style' (eg DNNL) BYOC integrations are supported. We call the result an 'optimal +partitioning'. This new tuning layer complements the tuning traditionally done by TVM and other toolchains during +lowering. It can also complement any global tuning, for example to explore the choice of layout convention or device +assignment. + +The approach is based on the [preprint](https://arxiv.org/pdf/2111.00655.pdf): + +> *Collage: Automated Integration of Deep Learning Backends* +> Byungsoo Jeon, Sunghyun Park, Peiyuan Liao, Sheng Xu, Tianqi Chen, Zhihao Jia + +(See Appendix A for a comparison of this proposal and the paper's implementation. See Appendix D for TODO items in the ' +v2' prototype.) + +This tuning approach contrasts with TVM's existing "greedy" and "manual" approaches to partitioning: + +- Greedy: Currently only the largest possible supported sub-graphs are used for partitions, irrespective of their + execution time. With Collage many more candidate sub-graphs are explored, and it is possible for two smaller + sub-graphs to yield better overall latency than one large sub-graph if they mix toolchains. +- Manual: Currently the TVM user must commit to a BYOC toolchain and invoke the corresponding + `partition_for_<toolchain>` function before the main TVM compilation flow begins. With Collage the choice of toolchain + can be automated based on measured latency. Collage will also explore mixing and matching between multiple BYOC + toolchains as well as TVM's native backend. + +When Collage is enabled it subsumes the existing `MergeComposite`/`AnnotateTarget`/`MergeCompilerRegions`/ Review comment: Though I've not follow every section to the letter I did rearrange to match most of the template. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
