roywei opened a new issue #14838: regression from cudnn upgrade from 7.3.1 to 
7.5.0
URL: https://github.com/apache/incubator-mxnet/issues/14838
 
 
   We have recently found a performance regression on training imagenet with 
resnet50v1 when upgrading from **cudnn 7.3.1 to 7.5.0**
   
   **Speed droped from ~5800 images/s to ~4800 images/s**
   
   Environment is AWS DLAMI with AMI ID : ami-2dcceb57 (available in us-east)
   
   
   command:
   ```
   python mxnet_benchmark/train_imagenet.py --use-rec --batch-size 256 --dtype 
float16 --num-data-workers 40 --num-epochs 90 --gpus 0,1,2,3,4,5,6,7 --lr 0.8 
--lr-decay-epoch 30,60,80 --warmup-epochs 5 --last-gamma --mode hybrid --model 
resnet50_v1b
   ```
   code at: 
https://github.com/rahul003/deep-learning-benchmark-mirror/blob/master/mxnet_benchmark/train_imagenet.py
   
   our nightly pip packages were impacted because now we are building with 
cuddn 7.5.0.
   Stable version of mxnet pip packages are not impacted.
   
   I m using this issue to keep track so everyone can be updated.
   
   cc
   @szha @DickJC123 @stu1130 @pinaraws

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to