zixuanweeei commented on issue #17086: [MKLDNN] RNN Op gradient computation is 
broken
URL: 
https://github.com/apache/incubator-mxnet/issues/17086#issuecomment-569213067
 
 
   Hi, @liuzh91 @szhengac. We have posted 
https://github.com/apache/incubator-mxnet/pull/17183 to fix the gradient 
explosion issue in RNN Backward. Thanks for reporting this issue again. And it 
would be greatly appreciated if you could give a test on this patch. Thanks.
   
   BTW, we got the below training log:
   ```
   ❯ python word_language_model.py --log-interval=1
   /path/to/mxnet/python/mxnet/optimizer/optimizer.py:167: UserWarning: 
WARNING: New optimizer gluonnlp.optimizer.lamb.LAMB is overriding existing 
optimizer mxnet.optimizer.optimizer.LAMB
     Optimizer.opt_registry[name].__name__))
   Namespace(alpha=2, batch_size=80, beta=1, bptt=70, clip=0.25, dropout=0.4, 
dropout_e=0.1, dropout_h=0.2, dropout_i=0.65, emsize=400, epochs=750, 
eval_only=False, gpu=None, log_interval=1, lr=30, lr_update_factor=0.1, 
lr_update_interval=30, model='lstm', nhid=1150, nlayers=3, ntasgd=False, 
optimizer='sgd', save='model.params', test_mode=False, tied=False, wd=1.2e-06, 
weight_dropout=0.5)
   Use AWDRNN
   AWDRNN(
     (embedding): HybridSequential(
       (0): Embedding(33278 -> 400, float32)
       (1): Dropout(p = 0.65, axes=(0,))
     )
     (encoder): HybridSequential(
       (0): LSTM(400 -> 1150, TNC)
       (1): LSTM(1150 -> 1150, TNC)
       (2): LSTM(1150 -> 1150, TNC)
     )
     (decoder): HybridSequential(
       (0): Dense(None -> 33278, linear)
     )
   )
   [Epoch 0 Batch 1/372] current loss 20.50, ppl 796977445.38, throughput 18.37 
samples/s, lr 30.86
   [Epoch 0 Batch 2/372] current loss 9.51, ppl 13511.50, throughput 39.56 
samples/s, lr 28.29
   [Epoch 0 Batch 3/372] current loss 17.53, ppl 41003388.51, throughput 40.65 
samples/s, lr 27.43
   [Epoch 0 Batch 4/372] current loss 9.45, ppl 12761.47, throughput 40.39 
samples/s, lr 27.43
   [Epoch 0 Batch 5/372] current loss 14.34, ppl 1695623.66, throughput 35.59 
samples/s, lr 31.71
   [Epoch 0 Batch 6/372] current loss 9.40, ppl 12113.46, throughput 35.10 
samples/s, lr 32.14
   [Epoch 0 Batch 7/372] current loss 8.56, ppl 5232.00, throughput 37.62 
samples/s, lr 30.00
   [Epoch 0 Batch 8/372] current loss 9.32, ppl 11163.67, throughput 42.00 
samples/s, lr 26.57
   [Epoch 0 Batch 9/372] current loss 8.44, ppl 4642.37, throughput 61.95 
samples/s, lr 17.14
   [Epoch 0 Batch 10/372] current loss 8.92, ppl 7494.76, throughput 41.39 
samples/s, lr 27.00
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to