oleg-trott edited a comment on issue #17665: No speedup from using FP16 (4 times slower than PyTorch) URL: https://github.com/apache/incubator-mxnet/issues/17665#issuecomment-592698779 @ptrendx Without `multi_precision`, `mxnet.optimizer.SGD` says it would just use the same precision as the weights. However, here, per iteration, I see FP16 + multi_precision : 0.14 FP16 : 0.21 FP32 : 0.24-0.34 So, not using multi_precision is actually slower with FP16. ```python import os os.environ['MXNET_SAFE_ACCUMULATION']='1' import mxnet as mx from mxnet import gluon, nd, autograd from mxnet.gluon.model_zoo import vision import numpy as np from time import time ctx = mx.gpu(0) m = vision.resnet50_v2(pretrained=True, ctx=ctx) bs = 32*2 n = 224 with ctx: x = nd.random.randn(bs, 3, n, n) target = nd.zeros(bs, dtype=np.int32) if 1: # change this x = x.astype('float16') m.cast('float16') loss = gluon.loss.SoftmaxCrossEntropyLoss() if 1: # change this args = {'learning_rate': 1e-9} else: args = {'learning_rate': 1e-9, 'multi_precision' : True} opt = gluon.Trainer(m.collect_params(), 'sgd', args) for i in range(100): tic = time() with autograd.record(): y = m(x) out = loss(y, target) out.backward() opt.step(batch_size=bs) nd.waitall() print(time() - tic) ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
