MasterJH5574 opened a new pull request, #15323:
URL: https://github.com/apache/tvm/pull/15323

   This PR fixes a bug in the LowerThreadAllreduce pass.
   
   Prior to this PR, in multi-group settings, the thread mask is not correctly 
set: when the reduction extent is 32, the thread mask will always be 0. This 
bug was not spotted because even when the mask is 0, the CUDA program still 
gives correct result. But in any way, having the zero mask is dangerous and 
should be fixed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to