Using AMP

Naveen Swamy Thu, 30 Apr 2020 22:15:24 -0700

Hello,
I am trying to use AMP on an RNN Model, however I am not seeing higher
throughputs using AMP. also the loss seems to have stagnated. I am
wondering if I am missing something.


Also has AMP has been tested on any RNN models and if there are any
benchmarks ? Appreciate some input here..

I used the RNN model here [1] and followed the tutorial in [2], the output
of the runs are
----
Without AMP:
mxnet-lm$ python train.py --cuda --tied --nhid 1500 --emsize 1500 --epochs
60  --dropout 0.65 --model gru --batch_size 128

[Epoch 3 Batch 200/13] loss 6.47, ppl 648.24, throughput 675.94 samples/s
[Epoch 3 Batch 400/13] loss 6.30, ppl 543.20, throughput 679.51 samples/s
[Epoch 3] time cost 90.29s, valid loss 5.97, valid ppl 392.94
test loss 5.89, test ppl 361.69
[Epoch 4 Batch 200/13] loss 6.15, ppl 470.58, throughput 676.46 samples/s
[Epoch 4 Batch 400/13] loss 6.01, ppl 408.21, throughput 679.51 samples/s
[Epoch 4] time cost 90.27s, valid loss 5.69, valid ppl 296.89

test loss 5.63, test ppl 277.58
----
With AMP:

(gluonnlp) ubuntu@ip-172-30-0-140:~/mxnet-lm$ python train.py --cuda --tied
--nhid 1500 --emsize 1500 --epochs 60  --dropout 0.65 --model gru
--batch_size 128 --amp True
Namespace(amp=True, batch_size=128, bptt=35, clip=0.25, cuda=True,
dropout=0.65, emsize=1500, epochs=60, export_model=False, gcthreshold=0.5,
gctype='none', hybridize=False, log_interval=200, lr=20, model='gru',
nhid=1500, nlayers=2, save='model.params', static_alloc=False,
static_shape=False, tied=True)
using AMP
INFO:root:Using AMP
[Epoch 3 Batch 200/13] loss 10.43, ppl 34026.18, throughput 685.66 samples/s
[Epoch 3 Batch 400/13] loss 10.38, ppl 32150.51, throughput 688.99 samples/s
[Epoch 3] time cost 89.04s, valid loss 10.36, valid ppl 31650.83
test loss 10.36, test ppl 31626.99
INFO:root:AMP: increasing loss scale to 131072.000000
[Epoch 4 Batch 200/13] loss 10.42, ppl 33642.12, throughput 686.83 samples/s
[Epoch 4 Batch 400/13] loss 10.37, ppl 31839.51, throughput 689.55 samples/s
----

changes made to the training loop after initializing amp and the trainer:

with autograd.record():
    output, hidden = model(data, hidden)
    # Here L is a vector of size batch_size * bptt size
    L = loss(output, target)
    L = L / (args.bptt * args.batch_size)
        with amp.scale_loss(L, trainer) as scaled_loss:
            mx.autograd.backward(scaled_loss)

----
[1]
https://github.com/apache/incubator-mxnet/blob/master/example/gluon/word_language_model/train.py

[2]
https://mxnet.apache.org/api/python/docs/tutorials/performance/backend/amp.html

Thanks, Naveen

Using AMP

Reply via email to