adstraw commented on PR #14329:
URL: https://github.com/apache/tvm/pull/14329#issuecomment-1480362460

   The same TIR annotation `tir.use_async_copy` is used to trigger both the 
`InjectPTXAsyncCopy` pass for CUDA codegen as well as `LowerAsyncDMA` pass used 
by Hexagon.  This is TIR annotation and behavior is legacy --- it is NOT 
changed with commit `c6c89c3`.  See 
[here](https://github.com/apache/tvm/blob/91428158f2053dd4cda912ae6b7b2fd6797964b0/src/driver/driver_api.cc#L239)
 and 
[here](https://github.com/apache/tvm/blob/91428158f2053dd4cda912ae6b7b2fd6797964b0/src/driver/driver_api.cc#L588).
   
   The advantage of reusing the `tir.use_async_copy` TIR annotation is that we 
have a converged way of handling async copy across multiple devices with async 
copies lowering to PTX for CUDA and DMA e.g. for Hexagon.  
   
   The disadvantage of reusing the `tir.use_async_copy` TIR annotation is that 
(I believe) BOTH `InjectPTXAsycCopy` and `LowerAsyncDMA` passes are running 
during CUDA codegen.  This worked without issue in the past, but it seems that 
recent changes in commit `c6c89c3` combined with this PR are exposing an issue.
   
   `LowerAsyncDMA` is meant to be a generic pass to lower async copies to DMA 
but it should probably have a target specific opt-in for devices (like Hexagon) 
that support this behavior rather than running for all devices.  If it is 
within scope for this PR to make that change, please do.  If not, please feel 
free to revert `c6c89c3` so this PR can proceed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to