ptrendx commented on issue #17665: No speedup from using FP16 (4 times slower than PyTorch) URL: https://github.com/apache/incubator-mxnet/issues/17665#issuecomment-592632257 Hmmm, when I run this with NVIDIA container, I get the s884 kernel and time is 8.1. Looking at the `dot` implementation it seems that in our version it goes through `linalg_gemm`, whereas upstream MXNet is using some `dot` function (which I did not find yet, I assume it is in mshadow?) and I guess it does not set the proper math mode there. Git blame shows that apparently @DickJC123 changed our version to use linalg_gemm 3 years ago and for some reason it never got upstreamed. @DickJC123 Could you make a PR with that? On our side it is commit ``` commit 46d7fe1d3d482b2d43573ae483bd8403a843fedf Author: Dick Carter <[email protected]> Date: Fri Oct 6 13:57:38 2017 -0700 Switched mx.sym.{batched_dot,dot} to use {linalg_batched_gemm,linalg_gemm}. ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
