DickJC123 opened a new pull request #7447: Tensorcore fullyconnected support2 URL: https://github.com/apache/incubator-mxnet/pull/7447 Consider this an alternative approach to getting TensorCore working with FullyConnected. It is far simpler than my first PR for this new functionality. If anything, this is my proof that one can invoke TensorCore algos through manipulation of the cublas handle along with the existing dot function's use of Hgemm and SgemmEx. This PR also shows the type of per-instance handle manipulations that are necessary, since blindly setting the handle globally to enable TensorCore will have the unfortunate side-effect of introducing fp16-casts on the inputs of fp32-I/O gemms. Bottom line, I wouldn't expect you to accept this PR without a discussion. I have begun studying the new linear algebra code with the idea of producing an enable-TensorCore PR for this new approach. I notice the new LA code doesn't support fp16 I/O gemms yet, and the solution there will not fit the mold of the existing function templates. Also, what is the plan for switching over MXNET's use of dot() to use the new functions? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
With regards, Apache Git Services