oleg-trott opened a new issue #17665: No speedup from using FP16 (4 times slower than PyTorch) URL: https://github.com/apache/incubator-mxnet/issues/17665 ## Description For dot products, there is no speedup from using FP16 (MXNet is 4 times slower than PyTorch) on RTX 2080ti. For ConvNets, there is similarly little or no gain when using FP16 in MXNet (Unlike with PyTorch) ## To Reproduce MXNet: ``` import mxnet as mx import numpy as np import time n = 2**14 ctx = mx.gpu(0) dtype = np.float16 with ctx: a = mx.nd.zeros((n, n), dtype=dtype) b = mx.nd.zeros((n, n), dtype=dtype) c = mx.nd.zeros((n, n), dtype=dtype) tic = time.time() for _ in range(100): mx.nd.dot(a, b, out=c) res = float(c[0, 0].asscalar()) # "use" the result print(time.time() - tic) ``` (Outputs approximately 60) PyTorch ``` import torch import numpy as np import time n = 2**14 dtype = torch.float16 a = torch.zeros((n, n), dtype=dtype).cuda() b = torch.zeros((n, n), dtype=dtype).cuda() c = torch.zeros((n, n), dtype=dtype).cuda() tic = time.time() with torch.no_grad(): for _ in range(100): torch.matmul(a, b, out=c) res = float(c[0, 0]) # "use" the result print(time.time() - tic) ``` (Outputs approximately 14) ## What have you tried to solve it? I suspect that tensor cores are not enabled for this GPU in MXNet. I tried to figure out if perhaps there is some flag or environment variable that I'm missing, but found nothing. ## Environment Nvidia RTX 2080ti Ubuntu 18.04 CUDA 10.1 PyTorch 1.3.1 MXNet installed with `~/anaconda3/bin/pip install mxnet-cu101mkl`
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
