MasterJH5574 opened a new pull request, #15399:
URL: https://github.com/apache/tvm/pull/15399

   PR #15327 and #15373 introduced multi-warp allreduce implementation. At the 
time of the introduction, I tested the correctness numerically via the workload 
of "taking a matrix of ones as input, computing the summation over each row". 
Both PR passed this numerical tess, while I didn't realize that this test is 
not complete and cannot guarantee the correctness.
   
   The previous implementation has bug which can be tested by turning the input 
matrix from ones to random floating-point numbers. This will expose the issues 
of the previous implementation.
   
   Therefore, this PR fixes the issues, and add the numerical tests for 
multi-warp allreduce into `test_allreduce_cuda.py`. By reducing some of the 
redundant tests in that file, we hope this can reduce the testing time a bit 
while still guarantee the correctness.
   
   Sorry for not testing the implementation completely before.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to