[GitHub] [incubator-mxnet] stu1130 opened a new pull request #15142: bump up cudnn version

GitBox Mon, 03 Jun 2019 18:12:56 -0700

stu1130 opened a new pull request #15142: bump up cudnn version
URL: https://github.com/apache/incubator-mxnet/pull/15142
 
 
   ## Description ##
   un three models ResNet50 with ImageNet & LSTM with PTB & MLP with MNIST
   Performance shown below
   Environment: P3.16xlarge Deep Learning Base AMI
   Codebase: commit 1540a84 for CUDA 9/9.2/10 
1540a84f1eca937235c51b507ea716c614f40805 for CUDA 10
   I also applied the #14837 PR change
   The unit of thoughput is samples/per second
   Each throughput is calcuated by average of 5 runs
   
   ### ResNet ###
   **model**: Resnet50
   **dataset**: Imagenet
   **number of gpu**: 8
   **epochs**: 3 (only to test throughput)
   **preprocess command**: sudo pip install gluoncv==0.2.0b20180625
   **command**: python mxnet_benchmark/train_imagenet.py --use-rec --batch-size 
128 --dtype float32 —num-data-workers 40 —num-epochs 3 —gpus 0,1,2,3,4,5,6,7 
--lr 0.05 --last-gamma —mode symbolic —model resnet50_v1b —rec-train 
/home/ubuntu/data/train-passthrough.rec —rec-train-idx 
/home/ubuntu/data/train-passthrough.idx —rec-val 
/home/ubuntu/data/val-passthrough.rec —rec-val-idx 
/home/ubuntu/data/val-passthrough.idx
   **github repo**: 
https://github.com/rahul003/deep-learning-benchmark-mirror.git*
   
   CUDA + MKLDNN
   
   | Throughput Tables   |      cuDNN 7.6.0/NCCL 2.4.2     | cuDNN 7.5.1/NCCL 
2.3.4 | Perforamnce Difference|
   
|:----------|:------------------------:|:--------------------:|:---------------------:|
   | CUDA 10.1 | 2831.23331 | 2817.18815 | 0.499%  |
   | CUDA 10 | 2784.42731 | 2831.54405 | -1.664%  |
   | CUDA 9.2 | 2823.64928 | 2832.36803 | -0.308% |
   | CUDA 9.0| 2807.82859 | 2815.83939 | -2.85% | 
   
   Reference(only 3 times run)
   without MKLDNN
   
   | Throughput Tables   |      cuDNN 7.6.0/NCCL 2.4.2     |
   |:----------|:------------------------:|
   | CUDA 10.1 | 2864.95587 | 
   | CUDA 10 | 2859.00876| 
   | CUDA 9.2 | 2908.62222 |
   | CUDA 9.0| 2858.38916 | 
   
   ### LSTM ###
   **model**: LSTM
   **dataset**: PTB(Penn Treebank)
   **number of gpu**: 1
   **epochs**: 10
   **command**:
   python2 benchmark_driver.py --framework mxnet --task-name 
mkl_lstm_ptb_symbolic --num-gpus 1 --epochs 10 --metrics-suffix test --kvstore 
local
   python word_language_model/lstm_bucketing.py —num-hidden 650 —num-embed 650 
—gpus 0 --epochs 10 --kv-store local
   
   CUDA + MKLDNN
   
   | Throughput Tables   |      cuDNN 7.6.0/NCCL 2.4.2     | cuDNN 7.5.1/NCCL 
2.3.4 | Perforamnce Difference|
   
|:----------|:------------------------:|:--------------------:|:---------------------:|
   | CUDA 10.1 | 1018.89083 | 1015.61785| 0.322%  |
   | CUDA 10 | 852.80333 | 847.98222| 0.569%  |
   | CUDA 9.2 | 1011.61122 | 1005.25185 | 0.632% |
   | CUDA 9.0| 992.34674| 1002.59081  | -1.021% | 
   
   **The CUDA 10 have a performance regression issue, please see #14725 to find 
more details.**
   
   Reference(only 3 times run)
   without MKLDNN
   
   | Throughput Tables   |      cuDNN 7.6.0/NCCL 2.4.2     |
   |:----------|:------------------------:|
   | CUDA 10.1 | 1010.1654 | 
   | CUDA 10 | 846.05572| 
   | CUDA 9.2 | 1007.27178 |
   | CUDA 9.0| 978.18158 | 
   
   
   ### MLP ###
   **model**: 3 dense layers with num_hidden=64 and relu as activation
   **dataset**: MNIST
   **number of gpu**: 1
   **epochs**: 10
   **command**:
   python2 benchmark_runner.py —framework mxnet —metrics-policy mlp —task-name 
mlp —metrics-suffix test —num-gpus 1 —command-to-execute 'python3 mlp.py' 
—data-set mnist
   
   CUDA + MKLDNN
   
   | Throughput Tables   |      cuDNN 7.6.0/NCCL 2.4.2     | cuDNN 7.5.1/NCCL 
2.3.4 | Perforamnce Difference|
   
|:----------|:------------------------:|:--------------------:|:---------------------:|
   | CUDA 10.1 | 4438.0091 | 4422.72478 | 0.346%  |
   | CUDA 10 | 4433.65315 | 4638.73873 | -4.421%  |
   | CUDA 9.2 | 4439.18763 | 4425.37599 | 0.312% |
   | CUDA 9.0| 4505.45334 | 4421.82611 | 1.891%| 
   
   Reference(only 3 times run)
   without MKLDNN
   
   | Throughput Tables   |      cuDNN 7.6.0/NCCL 2.4.2     |
   |:----------|:------------------------:|
   | CUDA 10.1 | 4515.74059 | 
   | CUDA 10 | 4349.40602| 
   | CUDA 9.2 | 4492.37239 |
   | CUDA 9.0| 4211.6375 | 
   
   
   ## Comments ##
   @szha @lanking520


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] stu1130 opened a new pull request #15142: bump up cudnn version

Reply via email to