t-vi opened a new pull request #5727: URL: https://github.com/apache/incubator-tvm/pull/5727
This adds warp shuffle intrinsics to ROCm and enables reductions. - There was at least one hardcoded 32 threads per warp assumption in `lower_thread_allreduce`. - I have tentatively hijacked a couple of cuda codegen tests which were useful in verifying the rocm functioning. I'm not quite sure what to do with that. Having rocm tests in test_target_codegen_cuda might not be intuitive, but code duplication is bad, too. The tests helped find the above hardcoded 32 threads. - This is my first meddling with intrinsics and producing expressions, so bear with me. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
