MasterJH5574 opened a new pull request, #15323: URL: https://github.com/apache/tvm/pull/15323
This PR fixes a bug in the LowerThreadAllreduce pass. Prior to this PR, in multi-group settings, the thread mask is not correctly set: when the reduction extent is 32, the thread mask will always be 0. This bug was not spotted because even when the mask is 0, the CUDA program still gives correct result. But in any way, having the zero mask is dangerous and should be fixed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
