[GitHub] [tvm-rfcs] mbs-octoml commented on a change in pull request #62: Collage RFC

GitBox Thu, 24 Mar 2022 11:41:09 -0700


mbs-octoml commented on a change in pull request #62:
URL: https://github.com/apache/tvm-rfcs/pull/62#discussion_r834605893




##########
File path: rfcs/xxxx-collage.md
##########
@@ -0,0 +1,987 @@
+# Design Doc: Collage [Draft 0.8]
+
+```
+Feature Name: Collage
+Start Date: Mar 2022
+Authors: Mark Shields ([email protected])
+RFC PR: <tbd>
+GitHub Issue: <tbd>
+
+History:
+- v0.7: First draft.
+- v0.8: Rework to emphasise 'partitioning' (quite early in pipeline) instead 
of 'fusion' (quite late in pipeline). 
+```
+
+This design doc (with accompanying
+['v2' prototype 
implementation](https://github.com/mbs-octoml/mbs-tvm/tree/mbs-collage-sketch))
+shows how to bring tuning to TVM's BYOC partitioning passes. The tuning search 
explores the choice of sub-graphs (aka '
+partitions') and toolchains (aka 'backends') so as to minimize the expected 
model inference latency. Both 'graph
+style' (eg TensorRT) and 'library style' (eg DNNL) BYOC integrations are 
supported. We call the result an 'optimal
+partitioning'. This new tuning layer complements the tuning traditionally done 
by TVM and other toolchains during
+lowering. It can also complement any global tuning, for example to explore the 
choice of layout convention or device
+assignment.
+
+The approach is based on the [preprint](https://arxiv.org/pdf/2111.00655.pdf):
+
+> *Collage: Automated Integration of Deep Learning Backends*  
+> Byungsoo Jeon, Sunghyun Park, Peiyuan Liao, Sheng Xu, Tianqi Chen, Zhihao Jia
+
+(See Appendix A for a comparison of this proposal and the paper's 
implementation. See Appendix D for TODO items in the '
+v2' prototype.)
+
+This tuning approach contrasts with TVM's existing "greedy" and "manual" 
approaches to partitioning:
+
+- Greedy: Currently only the largest possible supported sub-graphs are used 
for partitions, irrespective of their
+  execution time. With Collage many more candidate sub-graphs are explored, 
and it is possible for two smaller
+  sub-graphs to yield better overall latency than one large sub-graph if they 
mix toolchains.
+- Manual: Currently the TVM user must commit to a BYOC toolchain and invoke 
the corresponding
+  `partition_for_<toolchain>` function before the main TVM compilation flow 
begins. With Collage the choice of toolchain
+  can be automated based on measured latency. Collage will also explore mixing 
and matching between multiple BYOC
+  toolchains as well as TVM's native backend.
+
+When Collage is enabled it subsumes the existing 
`MergeComposite`/`AnnotateTarget`/`MergeCompilerRegions`/
+`PartitionGraph` passes embedded within each `partition_for_<toolchain>` 
function with a single new
+`CollagePartitioner` pass. The pass is guided by the list of available 
`Target`s and three existing sources:
+
+1. The `"TOpPattern"` attributes provided for every Relay operator and used by 
TVM's built-in `FuseOps`.
+2. The BYOC `"target.<toolchain>"` operator predicates provided for some 
operator/toolchain pairs by
+   'operator-based' BYOC integrations.
+3. The BYOC operator pattern/predicates (usually) registered in the pattern 
table by 'pattern-based' BYOC integrations.

Review comment:
       Yes, only for backwards compat! Meanwhile Michalis has converted 
TensorRT from operator- to pattern-based form, which is the only in-tree 
example. Once that is in I think I'll just drop support in collage for the 
operator predicate mechanism entirely. (Whether we drop it from TVM itself is 
an entirely separate question.)
   
   There's a very specific problem with the predicate-based approach which 
makes me not want to have to support it. In TRT the predicates were generally 
not being specific about whether op args needed to be constants or not, and 
collage was happily building candidates which the TRT builder (at model run 
time!)  would assert fail on. So I had to go through those and make the 
requirement explicit. But even then collage needs to understand how much of the 
sub-graph the predicate is being called on is critical to the predicate, which 
requires a tiny nested search with it's own extraction etc. All very ugly and 
I'd be very happy to be done with it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [tvm-rfcs] mbs-octoml commented on a change in pull request #62: Collage RFC

Reply via email to