sbodenstein commented on issue #11796: Batch_dot does not support FP16 well
URL: 
https://github.com/apache/incubator-mxnet/issues/11796#issuecomment-436414711
 
 
   @szha: can we reopen this? For some reason, the fix in 
https://github.com/dmlc/mshadow/pull/353 was reverted by 
[this](https://github.com/eric-haibin-lin/mshadow/commit/c879f3b7a877b8838f7b64c8e72b4ac3cc82e9d0)
 commit by @eric-haibin-lin .
   
   This code, run on version `1.3.0` (latest EC2 Deep Learning AMI):
   ```
   import mxnet as mx
   a = mx.nd.ones((100,100,100), ctx=mx.gpu(), dtype='float16')
   b = mx.nd.ones((100,100,100), ctx=mx.gpu(), dtype='float16')
   for i in range(10):
       c = mx.nd.batch_dot(a,b)
   mx.nd.waitall()
   import time
   begin = time.time()
   for i in range(500):
       c = mx.nd.batch_dot(a,b)
   mx.nd.waitall()
   end = time.time()
   print(end - begin)
   ```
   takes 0.9s on a V100 (and 0.0318s when using float32 instead, a 30x 
slowdown!)
   
   We want to implement transformers using TensorCores for training, but there 
is no way of doing this in MXNet at the moment (`linalg_gemm` and 
`linalg_gemm2` unfortunately don't support float16 either, despite it seemingly 
being implemented 
[here](https://github.com/apache/incubator-mxnet/blob/49e6a7e40691936e533f7cf16848b10c025e4e75/src/operator/linalg_impl.h#L244)).
   
   What is the plan for exposing any form of GEMM to users with Real16 and 
TensorCore support?
   
   @szhengac 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to