[GitHub] [tvm] adstraw commented on pull request #14329: [TensorIR] Support for L2 prefetch async copy and pred_guard enabled async in vectorized if_then_else

via GitHub Wed, 22 Mar 2023 16:00:41 -0700


adstraw commented on PR #14329:
URL: https://github.com/apache/tvm/pull/14329#issuecomment-1480362460

The same TIR annotation `tir.use_async_copy` is used to trigger both the
`InjectPTXAsyncCopy` pass for CUDA codegen as well as `LowerAsyncDMA` pass used
by Hexagon. This is TIR annotation and behavior is legacy --- it is NOT
changed with commit `c6c89c3`. See
[here](https://github.com/apache/tvm/blob/91428158f2053dd4cda912ae6b7b2fd6797964b0/src/driver/driver_api.cc#L239)
and
[here](https://github.com/apache/tvm/blob/91428158f2053dd4cda912ae6b7b2fd6797964b0/src/driver/driver_api.cc#L588).

The advantage of reusing the `tir.use_async_copy` TIR annotation is that we
have a converged way of handling async copy across multiple devices with async
copies lowering to PTX for CUDA and DMA e.g. for Hexagon.

The disadvantage of reusing the `tir.use_async_copy` TIR annotation is that
(I believe) BOTH `InjectPTXAsycCopy` and `LowerAsyncDMA` passes are running
during CUDA codegen. This worked without issue in the past, but it seems that
recent changes in commit `c6c89c3` combined with this PR are exposing an issue.

`LowerAsyncDMA` is meant to be a generic pass to lower async copies to DMA
but it should probably have a target specific opt-in for devices (like Hexagon)
that support this behavior rather than running for all devices. If it is
within scope for this PR to make that change, please do. If not, please feel
free to revert `c6c89c3` so this PR can proceed.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [tvm] adstraw commented on pull request #14329: [TensorIR] Support for L2 prefetch async copy and pred_guard enabled async in vectorized if_then_else

Reply via email to