rongzha1 commented on issue #16891: Upgrading MKLDNN to 1.0 causes performance regression. URL: https://github.com/apache/incubator-mxnet/issues/16891#issuecomment-559501681 Hi @samskalicky I applied AWS Deep learning AMI, c5.18xlarge and ubuntu 14.04 as yours Using @leleamol shared script to build mxnet: 1. mxnet1.5: git checkout v1.5.x(commit c9818480680f84daa6e281a974ab263691302ba8) when training, some error happens: mxnet.base.MXNetError: [08:18:23] src/operator/nn/mkldnn/mkldnn_base.cc:372: Unknown MKLDNN format for 4 dimensions: 53 So which version did you use? what's the commit id ? 2. mxnet1.6: git checkout v1.6.x(commit 200f0ec8ff55c7264554786822d8467dd9b15174) both script build and make cmd build, training speed is about 1700 samples/sec Cannot reproduce performance regression issue. Details: Using @leleamol shared script to build mxnet; 2 minor issue: 1. script error : source tools/staticbuild/build.sh $1 pip sh can not recognize ' source' cmd; remove 'source ' can work 2. link error: can't find /usr/lib/gcc/x86_64-linux-gnu/5/libgfortran.so try to link gcc5 lib, works well: ln -s /usr/lib/gcc/x86_64-linux-gnu/5/libgfortran.so /usr/lib/gcc/x86_64-linux-gnu/4.8/libgfortran.so after build: cd mxnet-build/python && python setup.py install run cifar training Result is as following: [08:45:29] src/io/iter_image_recordio_2.cc:178: ImageRecordIOParser2: data/cifar/train.rec, use 4 threads for decoding.. [08:45:29] src/io/iter_image_recordio_2.cc:178: ImageRecordIOParser2: data/cifar/test.rec, use 4 threads for decoding.. [08:45:29] src/executor/graph_executor.cc:1984: Subgraph backend MKLDNN is activated. INFO:root:Epoch[0] Batch [0-50] Speed: 1444.97 samples/sec accuracy=0.267770 INFO:root:Epoch[0] Batch [50-100] Speed: 1657.16 samples/sec accuracy=0.381563 INFO:root:Epoch[0] Batch [100-150] Speed: 1629.53 samples/sec accuracy=0.423438 INFO:root:Epoch[0] Batch [150-200] Speed: 1686.67 samples/sec accuracy=0.441875 INFO:root:Epoch[0] Batch [200-250] Speed: 1671.42 samples/sec accuracy=0.462187 INFO:root:Epoch[0] Batch [250-300] Speed: 1723.94 samples/sec accuracy=0.510000 INFO:root:Epoch[0] Batch [300-350] Speed: 1699.66 samples/sec accuracy=0.507500 INFO:root:Epoch[0] Batch [350-400] Speed: 1665.39 samples/sec accuracy=0.523125 INFO:root:Epoch[0] Batch [400-450] Speed: 1724.03 samples/sec accuracy=0.531250 INFO:root:Epoch[0] Batch [450-500] Speed: 1723.66 samples/sec accuracy=0.577187 INFO:root:Epoch[0] Batch [500-550] Speed: 1724.53 samples/sec accuracy=0.574375 INFO:root:Epoch[0] Batch [550-600] Speed: 1721.45 samples/sec accuracy=0.581250 INFO:root:Epoch[0] Batch [600-650] Speed: 1658.77 samples/sec accuracy=0.607500 INFO:root:Epoch[0] Batch [650-700] Speed: 1725.24 samples/sec accuracy=0.606250 INFO:root:Epoch[0] Batch [700-750] Speed: 1726.21 samples/sec accuracy=0.606563 I also use build cmd: make -j USE_MKLDNN=1 USE_BLAS=openblas USE_GPERFTOOLS=0 cd python/ && python setup.py install results as following: Archive: cifar10.zip creating: cifar/ inflating: cifar/test.rec inflating: cifar/test.lst inflating: cifar/train.lst inflating: cifar/train.rec [07:38:12] src/io/iter_image_recordio_2.cc:178: ImageRecordIOParser2: data/cifar/train.rec, use 4 threads for decoding.. [07:38:12] src/io/iter_image_recordio_2.cc:178: ImageRecordIOParser2: data/cifar/test.rec, use 4 threads for decoding.. [07:38:12] src/executor/graph_executor.cc:1984: Subgraph backend MKLDNN is activated. INFO:root:Epoch[0] Batch [0-50] Speed: 1416.12 samples/sec accuracy=0.278799 INFO:root:Epoch[0] Batch [50-100] Speed: 1673.98 samples/sec accuracy=0.385313 INFO:root:Epoch[0] Batch [100-150] Speed: 1624.87 samples/sec accuracy=0.424687 INFO:root:Epoch[0] Batch [150-200] Speed: 1668.53 samples/sec accuracy=0.438750 INFO:root:Epoch[0] Batch [200-250] Speed: 1664.30 samples/sec accuracy=0.478438 INFO:root:Epoch[0] Batch [250-300] Speed: 1696.48 samples/sec accuracy=0.511250 INFO:root:Epoch[0] Batch [300-350] Speed: 1701.83 samples/sec accuracy=0.517188 INFO:root:Epoch[0] Batch [350-400] Speed: 1616.46 samples/sec accuracy=0.545000 INFO:root:Epoch[0] Batch [400-450] Speed: 1697.75 samples/sec accuracy=0.556875 INFO:root:Epoch[0] Batch [450-500] Speed: 1703.83 samples/sec accuracy=0.575625 INFO:root:Epoch[0] Batch [500-550] Speed: 1703.13 samples/sec accuracy=0.572812 INFO:root:Epoch[0] Batch [550-600] Speed: 1699.32 samples/sec accuracy=0.587187 INFO:root:Epoch[0] Batch [600-650] Speed: 1682.87 samples/sec accuracy=0.604688 INFO:root:Epoch[0] Batch [650-700] Speed: 1671.12 samples/sec accuracy=0.612187 INFO:root:Epoch[0] Batch [700-750] Speed: 1705.85 samples/sec accuracy=0.611875 INFO:root:Epoch[0] Train-accuracy=0.516964 INFO:root:Epoch[0] Time cost=30.561 INFO:root:Epoch[0] Validation-accuracy=0.628085 attach screenshot:   
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
