adstraw commented on PR #14329: URL: https://github.com/apache/tvm/pull/14329#issuecomment-1480362460
The same TIR annotation `tir.use_async_copy` is used to trigger both the `InjectPTXAsyncCopy` pass for CUDA codegen as well as `LowerAsyncDMA` pass used by Hexagon. This is TIR annotation and behavior is legacy --- it is NOT changed with commit `c6c89c3`. See [here](https://github.com/apache/tvm/blob/91428158f2053dd4cda912ae6b7b2fd6797964b0/src/driver/driver_api.cc#L239) and [here](https://github.com/apache/tvm/blob/91428158f2053dd4cda912ae6b7b2fd6797964b0/src/driver/driver_api.cc#L588). The advantage of reusing the `tir.use_async_copy` TIR annotation is that we have a converged way of handling async copy across multiple devices with async copies lowering to PTX for CUDA and DMA e.g. for Hexagon. The disadvantage of reusing the `tir.use_async_copy` TIR annotation is that (I believe) BOTH `InjectPTXAsycCopy` and `LowerAsyncDMA` passes are running during CUDA codegen. This worked without issue in the past, but it seems that recent changes in commit `c6c89c3` combined with this PR are exposing an issue. `LowerAsyncDMA` is meant to be a generic pass to lower async copies to DMA but it should probably have a target specific opt-in for devices (like Hexagon) that support this behavior rather than running for all devices. If it is within scope for this PR to make that change, please do. If not, please feel free to revert `c6c89c3` so this PR can proceed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
