ptrendx commented on issue #17665: No speedup from using FP16 (4 times slower 
than PyTorch)
URL: 
https://github.com/apache/incubator-mxnet/issues/17665#issuecomment-592632257
 
 
   Hmmm, when I run this with NVIDIA container, I get the s884 kernel and time 
is 8.1. Looking at the `dot` implementation it seems that in our version it 
goes through `linalg_gemm`, whereas upstream MXNet is using some `dot` function 
(which I did not find yet, I assume it is in mshadow?) and I guess it does not 
set the proper math mode there. 
   
   Git blame shows that apparently @DickJC123 changed our version to use 
linalg_gemm 3 years ago and for some reason it never got upstreamed.
   
   @DickJC123 Could you make a PR with that? On our side it is commit
   ```
   commit 46d7fe1d3d482b2d43573ae483bd8403a843fedf                              
                                                                                
                                                      
   Author: Dick Carter <[email protected]>                                     
                                                                                
                                                      
   Date:   Fri Oct 6 13:57:38 2017 -0700
   
       Switched mx.sym.{batched_dot,dot} to use 
{linalg_batched_gemm,linalg_gemm}.
   ```
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to