PistonY commented on issue #13709: Why FP16 training speed is too slow on Tesla T4 in Gluon? URL: https://github.com/apache/incubator-mxnet/issues/13709#issuecomment-458792488 I tried to fixed input,FP32 work well but FP16 out of memory. This is my script. ```python from mxnet import nd, autograd from mxnet import gluon from mxnet.gluon import loss as gloss from gluoncv.model_zoo import * import mxnet as mx import time ctx = mx.gpu(0) data = nd.random.normal(shape=(64, 3, 224, 224), ctx=ctx) lable = nd.random.randint(low=0, high=1, shape=(64, 1), ctx=ctx) net = resnet101_v2() net.hybridize() net.initialize(ctx=ctx) net(data) test_num = 500 dtype = 'float16' # float32 or float16 if dtype != 'float32': net.cast(dtype) Loss = gloss.SoftmaxCrossEntropyLoss() trainer = gluon.Trainer(net.collect_params(), 'nag', {'learning_rate': 0.1, 'momentum': 0.9, 'multi_precision': True # when fp16 is enabled }) sta = time.time() for _ in range(test_num): with autograd.record(): output = net(data.astype(dtype, copy=False)) loss = Loss(output, lable.astype(dtype, copy=False)) loss.backward() trainer.step(128) end = time.time() print(end - sta) ``` mxnet version is 1.5.0 (--pre) When training with FP32,it cost 9921Mb memory and 75s. But I tested with FP16 memory usage from 7000Mb continue to grow until out of memory. I don't know why, it's looks like memory doesn't free.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
