AndrewZhaoLuo opened a new pull request #8341:
URL: https://github.com/apache/tvm/pull/8341


   CUDA codegen cannot seem to handle half types super well. Furthermore, 
mixing half types and floating point also seems to expose additional issues. 
Furthermore, some schedules which are supposed to support heterogenous outputs 
do not.
   
   This seems like a problem in codegen not with the mixed precision pass, so 
for now I am turning off accumulating into FP32 for the mixed precision pass. 
With this we can tune BERT and YoloV2 with results here:
   
   I will leave the codegen issues for 
https://github.com/apache/tvm/issues/8294. 
   I will leave the issues with schedule not supporting output dtypes here 
https://github.com/apache/tvm/issues/8340


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to