areusch commented on pull request #48:
URL: https://github.com/apache/tvm-rfcs/pull/48#issuecomment-1056363210


   @ccjoechou Summarizing our discussion a bit:
   
   - Marvell is interested in being able to arbitrarily partition a Relay graph 
into hardware-accelerated and non-hardware-acclerated parts
       - The boundaries between these parts are to be determined by Marvell 
backend; therefore, some additional control is needed over the default behavior 
provided by MergeComposite
       - @mbs-octoml suggests that they use the [StopFusion 
annotation](https://github.com/apache/tvm/blob/main/src/relay/op/annotation/annotation.h#L38)
 to manually enforce the boundaries. These annotations could be added 
programmatically via a Relay IRModule pass. StopFusion is used in [FuseOps 
pass](https://github.com/apache/tvm/blob/main/src/relay/transforms/fuse_ops.cc#L896)
 to avoid fusion.
       - Using this approach, the Marvell partitioning pass defined here could 
be simplified and the existing fusion pass could be used.
   - Marvell needs to be able to determine which:
       - Imported ONNX operator is responsible for a given Relay node
       - Relay node is responsible for a TIR CallNode
       
       This needs to happen at two times:
       
       1. At compile time, to serve as a reference to the boundary nodes 
between a hardware-accelerated and non-hardware-accelerated subgraph
       2. At runtime, to determine which backend operator to call
       
       A follow-up question here from me: at runtime, couldn’t you just emit 
the call to the correct backend operator? I wonder if the reason this mapping 
was needed was due to previous difficulties configuring the TVM partitioner (it 
would sometimes fuse across a desired boundary). Is it possible to avoid the 
need for this reference at runtime given the improved partitioning approach 
mentioned above?
       
       That doesn't solve the problem of needing to identify a Relay node at 
compile time. However, if we can reduce this need to a purely compile-time 
need, perhaps we can develop an alternate way to refer to a Relay node given 
Relay source code other than adding an id to the IR. cc @tqchen @junrushao1994 
in case they have ideas here.
   
       - Marvell proposes to add a Relay attribute exprnode_id and export this 
from the compiled artifact to identify the relay nodes which are fused into a 
particular subgraph
       - More broadly, source maps (e.g. mapping TIR to Relay to frontend 
operator) would help here.
   - Right now the RFC proposes to create a new GraphExecutorCodegen. It might 
not be necessary to do this if we could export the exprnode_id for Relay 
operators passed to BYOC. A suggestion is to create a Marvell-specific 
runtime::Module modeled after 
[CUDAModule](https://github.com/apache/tvm/blob/main/src/runtime/cuda/cuda_module.cc#L137)
 which contains several distinct pieces of generated code. The exprnode_ids 
could be kept separate from any binary instructions if encoded this way. This 
pattern is common amongst GPU-offloaded runtime::Module.
       - Additionally, note the 
[SaveToFile](https://github.com/apache/tvm/blob/main/src/runtime/cuda/cuda_module.cc#L70)
 override which is invoked when `Module.save()` is called from Python. This can 
allow you walk the runtime::Module tree from Python and collect the various 
exprnode_ids into a single e.g. JSON blob.
   - @jroesch to comment on rust CI failures
   - Marvell would like to contribute a simulator which can run in TVM CI to 
test their accelerator. We discussed either adding the sim to ci-cpu or a new 
ci-marvell, the method to do this, and limitations of TVM CI.
   - Marvell runs a patched version of the TVM CI internally. A primary reason 
why patching is needed is because many tests in the TVM CI require an internet 
connection to e.g. download models, but their CI is run in a sandbox. It would 
be particularly helpful to mark such tests e.g. via pytest.mark in order to 
make these easy to skip. We also discussed pre-populating the download_testdata 
cache and patching pytest.skip into download_testdata on their internal fork. 
cc @leandron @driazati @konturn for visibility and in case they have ideas here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to