manishucsd commented on pull request #10185:
URL: https://github.com/apache/tvm/pull/10185#issuecomment-1033063599


   On accuracy, floating point additions are not associative. The change the 
order can change the result. Parallel reduction does change the order of 
accumulation over GEMM-K (NPQ). Thus, some change between runs is expected. I 
don't have a guidance on what threshold to set in checking relative error. 
   
   I would take Haicheng's suggestions here and follow: 
   > If you want to investigate accuracy issue, i suggest you compare both 
cutlass and cudnn with a naive fp64 or fp32 version. 
   Run FP32 wgrad with no split-k and compare both cutlass and cudnn against 
this golden reference. 
   
   CUTLASS profiler uses integer input to initialize tensors and matrices. This 
is to make the error checking easier. You can also use the CUTLASS profiler 
approach to make sure there are no functional error, i.e., try the operation on 
integer input.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to