t-vi opened a new pull request #5727:
URL: https://github.com/apache/incubator-tvm/pull/5727


   This adds warp shuffle intrinsics to ROCm and enables reductions.
   
   - There was at least one hardcoded 32 threads per warp assumption in 
`lower_thread_allreduce`.
   - I have tentatively hijacked a couple of cuda codegen tests which were 
useful in verifying the rocm functioning. I'm not quite sure what to do with 
that. Having rocm tests in test_target_codegen_cuda might not be intuitive, but 
code duplication is bad, too. The tests helped find the above hardcoded 32 
threads.
   - This is my first meddling with intrinsics and producing expressions, so 
bear with me.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to