[GitHub] [tvm-rfcs] areusch commented on pull request #48: [RFC][BYOC] Marvell ML/AI Accelerator Integration

GitBox Tue, 01 Mar 2022 22:33:06 -0800


areusch commented on pull request #48:
URL: https://github.com/apache/tvm-rfcs/pull/48#issuecomment-1056363210

@ccjoechou Summarizing our discussion a bit:

- Marvell is interested in being able to arbitrarily partition a Relay graph
into hardware-accelerated and non-hardware-acclerated parts
- The boundaries between these parts are to be determined by Marvell
backend; therefore, some additional control is needed over the default behavior
provided by MergeComposite
- @mbs-octoml suggests that they use the [StopFusion
annotation](https://github.com/apache/tvm/blob/main/src/relay/op/annotation/annotation.h#L38)
to manually enforce the boundaries. These annotations could be added
programmatically via a Relay IRModule pass. StopFusion is used in [FuseOps
pass](https://github.com/apache/tvm/blob/main/src/relay/transforms/fuse_ops.cc#L896)
to avoid fusion.
- Using this approach, the Marvell partitioning pass defined here could
be simplified and the existing fusion pass could be used.
- Marvell needs to be able to determine which:
- Imported ONNX operator is responsible for a given Relay node
- Relay node is responsible for a TIR CallNode

This needs to happen at two times:

1. At compile time, to serve as a reference to the boundary nodes
between a hardware-accelerated and non-hardware-accelerated subgraph
2. At runtime, to determine which backend operator to call

A follow-up question here from me: at runtime, couldn’t you just emit
the call to the correct backend operator? I wonder if the reason this mapping
was needed was due to previous difficulties configuring the TVM partitioner (it
would sometimes fuse across a desired boundary). Is it possible to avoid the
need for this reference at runtime given the improved partitioning approach
mentioned above?

That doesn't solve the problem of needing to identify a Relay node at
compile time. However, if we can reduce this need to a purely compile-time
need, perhaps we can develop an alternate way to refer to a Relay node given
Relay source code other than adding an id to the IR. cc @tqchen @junrushao1994
in case they have ideas here.

- Marvell proposes to add a Relay attribute exprnode_id and export this
from the compiled artifact to identify the relay nodes which are fused into a
particular subgraph
- More broadly, source maps (e.g. mapping TIR to Relay to frontend
operator) would help here.
- Right now the RFC proposes to create a new GraphExecutorCodegen. It might
not be necessary to do this if we could export the exprnode_id for Relay
operators passed to BYOC. A suggestion is to create a Marvell-specific
runtime::Module modeled after
[CUDAModule](https://github.com/apache/tvm/blob/main/src/runtime/cuda/cuda_module.cc#L137)
which contains several distinct pieces of generated code. The exprnode_ids
could be kept separate from any binary instructions if encoded this way. This
pattern is common amongst GPU-offloaded runtime::Module.
- Additionally, note the
[SaveToFile](https://github.com/apache/tvm/blob/main/src/runtime/cuda/cuda_module.cc#L70)
override which is invoked when `Module.save()` is called from Python. This can
allow you walk the runtime::Module tree from Python and collect the various
exprnode_ids into a single e.g. JSON blob.
- @jroesch to comment on rust CI failures
- Marvell would like to contribute a simulator which can run in TVM CI to
test their accelerator. We discussed either adding the sim to ci-cpu or a new
ci-marvell, the method to do this, and limitations of TVM CI.
- Marvell runs a patched version of the TVM CI internally. A primary reason
why patching is needed is because many tests in the TVM CI require an internet
connection to e.g. download models, but their CI is run in a sandbox. It would
be particularly helpful to mark such tests e.g. via pytest.mark in order to
make these easy to skip. We also discussed pre-populating the download_testdata
cache and patching pytest.skip into download_testdata on their internal fork.
cc @leandron @driazati @konturn for visibility and in case they have ideas here.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [tvm-rfcs] areusch commented on pull request #48: [RFC][BYOC] Marvell ML/AI Accelerator Integration

Reply via email to