AndrewZhaoLuo opened a new pull request #8341: URL: https://github.com/apache/tvm/pull/8341
CUDA codegen cannot seem to handle half types super well. Furthermore, mixing half types and floating point also seems to expose additional issues. Furthermore, some schedules which are supposed to support heterogenous outputs do not. This seems like a problem in codegen not with the mixed precision pass, so for now I am turning off accumulating into FP32 for the mixed precision pass. With this we can tune BERT and YoloV2 with results here: I will leave the codegen issues for https://github.com/apache/tvm/issues/8294. I will leave the issues with schedule not supporting output dtypes here https://github.com/apache/tvm/issues/8340 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
