comaniac opened a new pull request #5059: [Draft][BYOC] Annotation Target with Merging URL: https://github.com/apache/incubator-tvm/pull/5059 This PR implements a Relay pass that annotates target device. Different from the existing annotation target pass (#4933), this pass implements the algorithm RFC proposed by @mbaret (https://discuss.tvm.ai/t/relay-improved-graph-partitioning-algorithm/5830). In short, it greedy merges supported ops and minimizes the number of generated subgraphs. Some highlights and lowlights for this PR: - The pass is general in terms of supporting multiple targets. We can use ["dnnl", "trt"], for example to annotate the graph. - The pass uses lots of utility functions which are supposed to be removed after https://discuss.tvm.ai/t/discuss-annotation-defined-subgraphs/5934 has been implemented. - This pass supports multiple outputs, but the subgraph with multiple outputs cannot be partitioned at this moment because we haven't supported multiple outputs in the partition pass. - The unit test uses exactly the same example used in the RFC. The "add" nodes are blue nodes while "substrate" are red nodes in the RFC figure. - I marked "substrate" as a non-support op to demonstrate how this pass works, but we need a more suitable way to do so. Here is the example graph: ``` def @main(%in_1: Tensor[(10, 10), float32], %in_2: Tensor[(10, 10), float32], %in_3: Tensor[(10, 10), float32], %in_4: Tensor[(10, 10), float32], %in_5: Tensor[(10, 10), float32], %in_6: Tensor[(10, 10), float32], %in_7: Tensor[(10, 10), float32], %in_8: Tensor[(10, 10), float32], %in_9: Tensor[(10, 10), float32], %in_10: Tensor[(10, 10), float32]) -> Tensor[(10, 10), float32] { %0 = add(%in_1, %in_2) /* ty=Tensor[(10, 10), float32] */; %1 = add(%in_3, %in_4) /* ty=Tensor[(10, 10), float32] */; %2 = add(%0, %1) /* ty=Tensor[(10, 10), float32] */; %3 = subtract(%in_5, %in_6) /* ty=Tensor[(10, 10), float32] */; %4 = subtract(%in_7, %3) /* ty=Tensor[(10, 10), float32] */; %5 = add(%2, %4) /* ty=Tensor[(10, 10), float32] */; %6 = subtract(%in_8, %5) /* ty=Tensor[(10, 10), float32] */; %7 = add(%in_9, %5) /* ty=Tensor[(10, 10), float32] */; %8 = add(%6, %7) /* ty=Tensor[(10, 10), float32] */; add(%in_10, %8) /* ty=Tensor[(10, 10), float32] */ } ``` After annotation with merge: ``` def @main(%in_1: Tensor[(10, 10), float32], %in_2: Tensor[(10, 10), float32], %in_3: Tensor[(10, 10), float32], %in_4: Tensor[(10, 10), float32], %in_5: Tensor[(10, 10), float32], %in_6: Tensor[(10, 10), float32], %in_7: Tensor[(10, 10), float32], %in_8: Tensor[(10, 10), float32], %in_9: Tensor[(10, 10), float32], %in_10: Tensor[(10, 10), float32]) -> Tensor[(10, 10), float32] { %0 = annotation.compiler_begin(%in_10, meta[relay.attrs.CompilerAttrs][0]) /* ty=Tensor[(10, 10), float32] */; %1 = annotation.compiler_begin(%in_1, meta[relay.attrs.CompilerAttrs][1]) /* ty=Tensor[(10, 10), float32] */; %2 = annotation.compiler_begin(%in_2, meta[relay.attrs.CompilerAttrs][2]) /* ty=Tensor[(10, 10), float32] */; %3 = add(%1, %2) /* ty=Tensor[(10, 10), float32] */; %4 = annotation.compiler_begin(%in_3, meta[relay.attrs.CompilerAttrs][3]) /* ty=Tensor[(10, 10), float32] */; %5 = annotation.compiler_begin(%in_4, meta[relay.attrs.CompilerAttrs][4]) /* ty=Tensor[(10, 10), float32] */; %6 = add(%4, %5) /* ty=Tensor[(10, 10), float32] */; %7 = add(%3, %6) /* ty=Tensor[(10, 10), float32] */; %8 = subtract(%in_5, %in_6) /* ty=Tensor[(10, 10), float32] */; %9 = subtract(%in_7, %8) /* ty=Tensor[(10, 10), float32] */; %10 = annotation.compiler_begin(%9, meta[relay.attrs.CompilerAttrs][5]) /* ty=Tensor[(10, 10), float32] */; %11 = add(%7, %10) /* ty=Tensor[(10, 10), float32] */; %12 = annotation.compiler_end(%11, meta[relay.attrs.CompilerAttrs][6]) /* ty=Tensor[(10, 10), float32] */; %13 = subtract(%in_8, %12) /* ty=Tensor[(10, 10), float32] */; %14 = annotation.compiler_begin(%13, meta[relay.attrs.CompilerAttrs][7]) /* ty=Tensor[(10, 10), float32] */; %15 = annotation.compiler_begin(%in_9, meta[relay.attrs.CompilerAttrs][8]) /* ty=Tensor[(10, 10), float32] */; %16 = add(%15, %11) /* ty=Tensor[(10, 10), float32] */; %17 = annotation.compiler_end(%16, meta[relay.attrs.CompilerAttrs][9]) /* ty=Tensor[(10, 10), float32] */; %18 = annotation.compiler_begin(%17, meta[relay.attrs.CompilerAttrs][10]) /* ty=Tensor[(10, 10), float32] */; %19 = add(%14, %18) /* ty=Tensor[(10, 10), float32] */; %20 = add(%0, %19) /* ty=Tensor[(10, 10), float32] */; annotation.compiler_end(%20, meta[relay.attrs.CompilerAttrs][11]) /* ty=Tensor[(10, 10), float32] */ } ``` I'll need to clean up the code and refactor the unit test before it can be reviewed and merged. Meanwhile, @mbaret since you are also working on this pass, could you share your thoughts? We don't have to merge this PR if yours is almost done. cc @zhiics
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
