haojin2 commented on issue #14830: Use env var to enforce safe accumulation in 
ReduceAxesCompute
URL: https://github.com/apache/incubator-mxnet/pull/14830#issuecomment-493229509
 
 
   Out of curiosity I did the following experiment on my p2.8xlarge:
   1) Checkout the commit before the softmax PR, fresh build from source, then 
run the script.
   2) Checkout the commit of the softmax PR, fresh build from source, then run 
the script.
   Here're the logs
   Built from source at the commit before the softmax PR:
   ```
   ubuntu@ip-162-32-28-44:~/deeplearning-benchmark$ python 
word_language_model/word_language_model.py --gpus 8 --nhid 650 --emsize 650 
--dropout 0.5 --epochs 40 --data word_language_model/data/ptb. --mode 
imperative --kvstore device
   INFO:root:[Epoch 0] time cost 30.57s, valid loss 6.43, valid ppl 618.67
   INFO:root:test loss 6.39, test ppl 596.96
   INFO:root:[Epoch 1] time cost 28.15s, valid loss 6.05, valid ppl 424.32
   INFO:root:test loss 6.03, test ppl 416.74
   INFO:root:[Epoch 2] time cost 28.66s, valid loss 5.76, valid ppl 317.45
   INFO:root:test loss 5.74, test ppl 310.12
   INFO:root:[Epoch 3] time cost 28.61s, valid loss 5.60, valid ppl 270.85
   INFO:root:test loss 5.58, test ppl 264.98
   INFO:root:[Epoch 4] time cost 28.92s, valid loss 5.44, valid ppl 229.80
   INFO:root:test loss 5.40, test ppl 221.53
   INFO:root:[Epoch 5] time cost 28.94s, valid loss 5.33, valid ppl 207.46
   INFO:root:test loss 5.31, test ppl 202.24
   INFO:root:[Epoch 6] time cost 29.13s, valid loss 5.26, valid ppl 193.18
   INFO:root:test loss 5.24, test ppl 188.84
   INFO:root:[Epoch 7] time cost 28.76s, valid loss 5.19, valid ppl 178.78
   INFO:root:test loss 5.16, test ppl 174.57
   INFO:root:[Epoch 8] time cost 29.33s, valid loss 5.13, valid ppl 169.77
   INFO:root:test loss 5.11, test ppl 165.58
   INFO:root:[Epoch 9] time cost 28.92s, valid loss 5.09, valid ppl 162.16
   INFO:root:test loss 5.06, test ppl 158.30
   INFO:root:[Epoch 10] time cost 29.29s, valid loss 5.03, valid ppl 153.41
   INFO:root:test loss 5.00, test ppl 147.82
   INFO:root:[Epoch 11] time cost 29.02s, valid loss 5.01, valid ppl 149.68
   INFO:root:test loss 4.97, test ppl 144.52
   INFO:root:[Epoch 12] time cost 29.12s, valid loss 4.99, valid ppl 146.27
   INFO:root:test loss 4.95, test ppl 141.74
   INFO:root:[Epoch 13] time cost 29.10s, valid loss 4.95, valid ppl 141.57
   INFO:root:test loss 4.92, test ppl 136.56
   INFO:root:[Epoch 14] time cost 29.19s, valid loss 4.93, valid ppl 139.02
   INFO:root:test loss 4.90, test ppl 134.21
   INFO:root:[Epoch 15] time cost 29.02s, valid loss 4.92, valid ppl 137.63
   INFO:root:test loss 4.89, test ppl 132.71
   INFO:root:[Epoch 16] time cost 29.45s, valid loss 4.90, valid ppl 134.44
   INFO:root:test loss 4.86, test ppl 128.75
   INFO:root:[Epoch 17] time cost 28.85s, valid loss 4.87, valid ppl 130.48
   INFO:root:test loss 4.83, test ppl 124.94
   INFO:root:[Epoch 18] time cost 29.18s, valid loss 4.87, valid ppl 130.76
   INFO:root:[Epoch 19] time cost 29.32s, valid loss 4.85, valid ppl 127.34
   INFO:root:test loss 4.80, test ppl 121.90
   INFO:root:[Epoch 20] time cost 29.29s, valid loss 4.84, valid ppl 126.82
   INFO:root:test loss 4.80, test ppl 121.36
   INFO:root:[Epoch 21] time cost 28.72s, valid loss 4.84, valid ppl 126.15
   INFO:root:test loss 4.79, test ppl 120.70
   INFO:root:[Epoch 22] time cost 29.30s, valid loss 4.83, valid ppl 125.70
   INFO:root:test loss 4.79, test ppl 120.15
   INFO:root:[Epoch 23] time cost 29.05s, valid loss 4.83, valid ppl 125.46
   INFO:root:test loss 4.79, test ppl 119.92
   INFO:root:[Epoch 24] time cost 29.18s, valid loss 4.83, valid ppl 124.62
   INFO:root:test loss 4.78, test ppl 119.24
   INFO:root:[Epoch 25] time cost 29.04s, valid loss 4.83, valid ppl 124.73
   INFO:root:[Epoch 26] time cost 29.33s, valid loss 4.82, valid ppl 124.55
   INFO:root:test loss 4.78, test ppl 118.98
   INFO:root:[Epoch 27] time cost 28.93s, valid loss 4.82, valid ppl 124.35
   INFO:root:test loss 4.78, test ppl 118.82
   INFO:root:[Epoch 28] time cost 29.26s, valid loss 4.82, valid ppl 124.18
   INFO:root:test loss 4.78, test ppl 118.66
   INFO:root:[Epoch 29] time cost 28.90s, valid loss 4.82, valid ppl 124.09
   INFO:root:test loss 4.78, test ppl 118.56
   INFO:root:[Epoch 30] time cost 29.43s, valid loss 4.82, valid ppl 123.99
   INFO:root:test loss 4.77, test ppl 118.47
   INFO:root:[Epoch 31] time cost 28.97s, valid loss 4.82, valid ppl 123.91
   INFO:root:test loss 4.77, test ppl 118.44
   INFO:root:[Epoch 32] time cost 29.27s, valid loss 4.82, valid ppl 123.66
   INFO:root:test loss 4.77, test ppl 118.21
   INFO:root:[Epoch 33] time cost 28.87s, valid loss 4.82, valid ppl 123.63
   INFO:root:test loss 4.77, test ppl 118.19
   INFO:root:[Epoch 34] time cost 29.28s, valid loss 4.82, valid ppl 123.71
   INFO:root:[Epoch 35] time cost 29.16s, valid loss 4.82, valid ppl 123.58
   INFO:root:test loss 4.77, test ppl 118.09
   INFO:root:[Epoch 36] time cost 29.39s, valid loss 4.82, valid ppl 123.54
   INFO:root:test loss 4.77, test ppl 118.05
   INFO:root:[Epoch 37] time cost 29.09s, valid loss 4.82, valid ppl 123.55
   INFO:root:[Epoch 38] time cost 29.43s, valid loss 4.82, valid ppl 123.47
   INFO:root:test loss 4.77, test ppl 117.99
   INFO:root:[Epoch 39] time cost 29.04s, valid loss 4.82, valid ppl 123.44
   INFO:root:test loss 4.77, test ppl 117.97
   INFO:root:Best test loss 4.77, test ppl 117.97
   ubuntu@ip-162-32-28-44:~/deeplearning-benchmark$ cd ..
   ubuntu@ip-162-32-28-44:~$ cd incubator-mxnet/
   ubuntu@ip-162-32-28-44:~/incubator-mxnet$ git status
   HEAD detached at f9c436be2
   nothing to commit, working tree clean
   ubuntu@ip-162-32-28-44:~/incubator-mxnet$ python
   Python 3.6.7 (default, Oct 22 2018, 11:32:17) 
   [GCC 8.2.0] on linux
   Type "help", "copyright", "credits" or "license" for more information.
   >>> import mxnet as mx
   >>> mx
   <module 'mxnet' from '/home/ubuntu/incubator-mxnet/python/mxnet/__init__.py'>
   >>> quit()
   ubuntu@ip-162-32-28-44:~/incubator-mxnet$ git show
   commit f9c436be2689ac809aab5422b41f2fd768e8c4bc (HEAD)
   Author: Przemyslaw Tredak <[email protected]>
   Date:   Tue Feb 19 16:03:36 2019 -0800
   
       Fix req=null in SliceLikeBackward (#14209)
   
   diff --git a/src/operator/tensor/matrix_op-inl.h 
b/src/operator/tensor/matrix_op-inl.h
   index 97c4fa556..28ed4215e 100644
   --- a/src/operator/tensor/matrix_op-inl.h
   +++ b/src/operator/tensor/matrix_op-inl.h
   @@ -1389,13 +1389,15 @@ void SliceLikeBackward(const nnvm::NodeAttrs& attrs,
      CHECK_EQ(inputs.size(), 1U);
      CHECK_EQ(outputs.size(), 2U);
      CHECK_EQ(req.size(), 2U);
   -  if (req[0] == kNullOp) return;
      using namespace mshadow;
      Stream<xpu>* s = ctx.get_stream<xpu>();
   +  if (req[1] != kNullOp && req[1] != kAddTo) {
   +    Fill(s, outputs[1], req[1], 0);  // Second input not relavant to 
gradients.
   +  }
   +  if (req[0] == kNullOp) return;
      const TBlob& ograd = inputs[0];
      const TBlob& igrad = outputs[0];
      const SliceLikeParam& param = nnvm::get<SliceLikeParam>(attrs.parsed);
   -  Fill(s, outputs[1], req[1], 0);  // Second input not relavant to 
gradients.
      if (req[0] == kWriteTo) {
        Fill(s, igrad, req[0], 0);
      } else if (req[0] == kWriteInplace) {
   ```
   Built from source at the commit of the softmax PR:
   ```
   ubuntu@ip-162-32-28-44:~/deeplearning-benchmark$ python 
word_language_model/word_language_model.py --gpus 8 --nhid 650 --emsize 650 
--dropout 0.5 --epochs 40 --data word_language_model/data/ptb. --mode 
imperative --kvstore device
   INFO:root:[Epoch 0] time cost 30.52s, valid loss 6.60, valid ppl 737.45
   INFO:root:test loss 6.57, test ppl 714.20
   INFO:root:[Epoch 1] time cost 28.03s, valid loss 6.13, valid ppl 461.04
   INFO:root:test loss 6.10, test ppl 446.37
   INFO:root:[Epoch 2] time cost 28.50s, valid loss 5.81, valid ppl 332.27
   INFO:root:test loss 5.77, test ppl 320.43
   INFO:root:[Epoch 3] time cost 28.28s, valid loss 5.58, valid ppl 264.90
   INFO:root:test loss 5.54, test ppl 254.72
   INFO:root:[Epoch 4] time cost 28.83s, valid loss 5.41, valid ppl 224.44
   INFO:root:test loss 5.38, test ppl 217.38
   INFO:root:[Epoch 5] time cost 28.54s, valid loss 5.34, valid ppl 208.28
   INFO:root:test loss 5.30, test ppl 201.23
   INFO:root:[Epoch 6] time cost 28.98s, valid loss 5.24, valid ppl 188.28
   INFO:root:test loss 5.21, test ppl 182.98
   INFO:root:[Epoch 7] time cost 28.58s, valid loss 5.18, valid ppl 177.91
   INFO:root:test loss 5.15, test ppl 172.79
   INFO:root:[Epoch 8] time cost 29.14s, valid loss 5.14, valid ppl 170.55
   INFO:root:test loss 5.10, test ppl 164.79
   INFO:root:[Epoch 9] time cost 28.61s, valid loss 5.08, valid ppl 160.70
   INFO:root:test loss 5.05, test ppl 155.34
   INFO:root:[Epoch 10] time cost 28.87s, valid loss 5.03, valid ppl 153.40
   INFO:root:test loss 5.00, test ppl 148.69
   INFO:root:[Epoch 11] time cost 28.71s, valid loss 5.00, valid ppl 149.02
   INFO:root:test loss 4.97, test ppl 144.28
   INFO:root:[Epoch 12] time cost 28.87s, valid loss 4.97, valid ppl 143.34
   INFO:root:test loss 4.93, test ppl 137.89
   INFO:root:[Epoch 13] time cost 28.80s, valid loss 4.95, valid ppl 140.72
   INFO:root:test loss 4.91, test ppl 136.31
   INFO:root:[Epoch 14] time cost 29.21s, valid loss 4.93, valid ppl 137.89
   INFO:root:test loss 4.89, test ppl 132.43
   INFO:root:[Epoch 15] time cost 28.84s, valid loss 4.91, valid ppl 135.74
   INFO:root:test loss 4.87, test ppl 130.02
   INFO:root:[Epoch 16] time cost 29.02s, valid loss 4.89, valid ppl 133.00
   INFO:root:test loss 4.85, test ppl 127.47
   INFO:root:[Epoch 17] time cost 28.74s, valid loss 4.87, valid ppl 130.48
   INFO:root:test loss 4.83, test ppl 124.78
   INFO:root:[Epoch 18] time cost 28.89s, valid loss 4.86, valid ppl 129.04
   INFO:root:test loss 4.82, test ppl 124.39
   INFO:root:[Epoch 19] time cost 28.88s, valid loss 4.85, valid ppl 127.15
   INFO:root:test loss 4.81, test ppl 122.78
   INFO:root:[Epoch 20] time cost 29.10s, valid loss 4.83, valid ppl 124.98
   INFO:root:test loss 4.79, test ppl 120.01
   INFO:root:[Epoch 21] time cost 28.94s, valid loss 4.83, valid ppl 125.68
   INFO:root:[Epoch 22] time cost 29.27s, valid loss 4.81, valid ppl 122.71
   INFO:root:test loss 4.76, test ppl 116.87
   INFO:root:[Epoch 23] time cost 28.86s, valid loss 4.81, valid ppl 122.14
   INFO:root:test loss 4.76, test ppl 116.37
   INFO:root:[Epoch 24] time cost 29.21s, valid loss 4.80, valid ppl 121.99
   INFO:root:test loss 4.76, test ppl 116.17
   INFO:root:[Epoch 25] time cost 28.77s, valid loss 4.80, valid ppl 121.56
   INFO:root:test loss 4.75, test ppl 115.79
   INFO:root:[Epoch 26] time cost 29.35s, valid loss 4.80, valid ppl 121.45
   INFO:root:test loss 4.75, test ppl 115.69
   INFO:root:[Epoch 27] time cost 28.77s, valid loss 4.80, valid ppl 121.28
   INFO:root:test loss 4.75, test ppl 115.41
   INFO:root:[Epoch 28] time cost 29.18s, valid loss 4.79, valid ppl 120.86
   INFO:root:test loss 4.75, test ppl 115.07
   INFO:root:[Epoch 29] time cost 28.79s, valid loss 4.79, valid ppl 120.70
   INFO:root:test loss 4.74, test ppl 114.90
   INFO:root:[Epoch 30] time cost 29.17s, valid loss 4.79, valid ppl 120.60
   INFO:root:test loss 4.74, test ppl 114.86
   INFO:root:[Epoch 31] time cost 28.73s, valid loss 4.79, valid ppl 120.13
   INFO:root:test loss 4.74, test ppl 114.50
   INFO:root:[Epoch 32] time cost 29.13s, valid loss 4.79, valid ppl 119.99
   INFO:root:test loss 4.74, test ppl 114.25
   INFO:root:[Epoch 33] time cost 28.82s, valid loss 4.79, valid ppl 120.21
   INFO:root:[Epoch 34] time cost 29.09s, valid loss 4.78, valid ppl 119.59
   INFO:root:test loss 4.73, test ppl 113.82
   INFO:root:[Epoch 35] time cost 28.73s, valid loss 4.78, valid ppl 119.63
   INFO:root:[Epoch 36] time cost 29.12s, valid loss 4.78, valid ppl 119.59
   INFO:root:[Epoch 37] time cost 28.74s, valid loss 4.78, valid ppl 119.56
   INFO:root:test loss 4.73, test ppl 113.79
   INFO:root:[Epoch 38] time cost 28.81s, valid loss 4.78, valid ppl 119.54
   INFO:root:test loss 4.73, test ppl 113.76
   INFO:root:[Epoch 39] time cost 28.74s, valid loss 4.78, valid ppl 119.51
   INFO:root:test loss 4.73, test ppl 113.73
   INFO:root:Best test loss 4.73, test ppl 113.73
   ubuntu@ip-162-32-28-44:~/deeplearning-benchmark$ cd ../incubator-mxnet
   ubuntu@ip-162-32-28-44:~/incubator-mxnet$ python
   Python 3.6.7 (default, Oct 22 2018, 11:32:17) 
   [GCC 8.2.0] on linux
   Type "help", "copyright", "credits" or "license" for more information.
   >>> import mxnet as mx
   >>> mx
   <module 'mxnet' from '/home/ubuntu/incubator-mxnet/python/mxnet/__init__.py'>
   >>> quit()
   ubuntu@ip-162-32-28-44:~/incubator-mxnet$ git show
   commit 862cbc67aacf81990b8c885847686a4c3c734cd3 (HEAD)
   Author: Sheng Zha <[email protected]>
   Date:   Wed Feb 20 16:37:12 2019 -0800
   
       softmax for fp16 with fp32 accumulator (#14098)
       
       * softmax for fp16 with fp32 accumulator
       
       * return AType in kernel
       
       * add dtype
       
       * kernel
       
       * grad use in-out only when dtype override
       
       * simplify infer type
       
       * address comments
   ```
   So here I'm actually seeing an INCREASE of final model performance after 
adding in the softmax PR with all other variables controlled... @nswamy Can you 
perform the exactly same experiment on your machine to see if this is the case? 
Your previous experiments were not controlling all other variables so the 
vision may be blurred to some extent...
   @anirudh2290 @eric-haibin-lin FYI

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to