roywei opened a new issue #14838: regression from cudnn upgrade from 7.3.1 to 7.5.0 URL: https://github.com/apache/incubator-mxnet/issues/14838 We have recently found a performance regression on training imagenet with resnet50v1 when upgrading from **cudnn 7.3.1 to 7.5.0** **Speed droped from ~5800 images/s to ~4800 images/s** Environment is AWS DLAMI with AMI ID : ami-2dcceb57 (available in us-east) command: ``` python mxnet_benchmark/train_imagenet.py --use-rec --batch-size 256 --dtype float16 --num-data-workers 40 --num-epochs 90 --gpus 0,1,2,3,4,5,6,7 --lr 0.8 --lr-decay-epoch 30,60,80 --warmup-epochs 5 --last-gamma --mode hybrid --model resnet50_v1b ``` code at: https://github.com/rahul003/deep-learning-benchmark-mirror/blob/master/mxnet_benchmark/train_imagenet.py our nightly pip packages were impacted because now we are building with cuddn 7.5.0. Stable version of mxnet pip packages are not impacted. I m using this issue to keep track so everyone can be updated. cc @szha @DickJC123 @stu1130 @pinaraws
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
