rongzha1 commented on issue #16891: Upgrading MKLDNN to 1.0 causes performance 
regression.
URL: 
https://github.com/apache/incubator-mxnet/issues/16891#issuecomment-559501681
 
 
   Hi @samskalicky  I applied AWS Deep learning AMI, c5.18xlarge and ubuntu 
14.04 as yours
   Using @leleamol shared script to build mxnet:
   
   1. mxnet1.5:
         git checkout v1.5.x(commit c9818480680f84daa6e281a974ab263691302ba8)
         when training, some error happens:
         mxnet.base.MXNetError: [08:18:23] 
src/operator/nn/mkldnn/mkldnn_base.cc:372: Unknown MKLDNN format for 4 
dimensions: 53
         So which version did you use?  what's the commit id ?
   
   2. mxnet1.6:
        git checkout v1.6.x(commit 200f0ec8ff55c7264554786822d8467dd9b15174)
         both script build and make cmd build, training speed is about 1700 
samples/sec
   
   Cannot reproduce performance regression issue.
   
   
   Details:
   Using @leleamol shared script to build mxnet; 2 minor issue:
   1. script error : source tools/staticbuild/build.sh $1 pip    sh can not 
recognize ' source' cmd;
      remove 'source ' can work
   2. link error:  can't find /usr/lib/gcc/x86_64-linux-gnu/5/libgfortran.so
       try to link gcc5 lib, works well:
       ln -s /usr/lib/gcc/x86_64-linux-gnu/5/libgfortran.so 
/usr/lib/gcc/x86_64-linux-gnu/4.8/libgfortran.so
   after build: cd mxnet-build/python && python setup.py install
   run cifar training
   
   Result is as following:
   [08:45:29] src/io/iter_image_recordio_2.cc:178: ImageRecordIOParser2: 
data/cifar/train.rec, use 4 threads for decoding..
   [08:45:29] src/io/iter_image_recordio_2.cc:178: ImageRecordIOParser2: 
data/cifar/test.rec, use 4 threads for decoding..
   [08:45:29] src/executor/graph_executor.cc:1984: Subgraph backend MKLDNN is 
activated.
   INFO:root:Epoch[0] Batch [0-50]      Speed: 1444.97 samples/sec      
accuracy=0.267770
   INFO:root:Epoch[0] Batch [50-100]    Speed: 1657.16 samples/sec      
accuracy=0.381563
   INFO:root:Epoch[0] Batch [100-150]   Speed: 1629.53 samples/sec      
accuracy=0.423438
   INFO:root:Epoch[0] Batch [150-200]   Speed: 1686.67 samples/sec      
accuracy=0.441875
   INFO:root:Epoch[0] Batch [200-250]   Speed: 1671.42 samples/sec      
accuracy=0.462187
   INFO:root:Epoch[0] Batch [250-300]   Speed: 1723.94 samples/sec      
accuracy=0.510000
   INFO:root:Epoch[0] Batch [300-350]   Speed: 1699.66 samples/sec      
accuracy=0.507500
   INFO:root:Epoch[0] Batch [350-400]   Speed: 1665.39 samples/sec      
accuracy=0.523125
   INFO:root:Epoch[0] Batch [400-450]   Speed: 1724.03 samples/sec      
accuracy=0.531250
   INFO:root:Epoch[0] Batch [450-500]   Speed: 1723.66 samples/sec      
accuracy=0.577187
   INFO:root:Epoch[0] Batch [500-550]   Speed: 1724.53 samples/sec      
accuracy=0.574375
   INFO:root:Epoch[0] Batch [550-600]   Speed: 1721.45 samples/sec      
accuracy=0.581250
   INFO:root:Epoch[0] Batch [600-650]   Speed: 1658.77 samples/sec      
accuracy=0.607500
   INFO:root:Epoch[0] Batch [650-700]   Speed: 1725.24 samples/sec      
accuracy=0.606250
   INFO:root:Epoch[0] Batch [700-750]   Speed: 1726.21 samples/sec      
accuracy=0.606563
   
   I also use build cmd:
   make -j USE_MKLDNN=1 USE_BLAS=openblas USE_GPERFTOOLS=0
   cd python/ && python setup.py install
   results as following:
   Archive:  cifar10.zip
      creating: cifar/
     inflating: cifar/test.rec          
     inflating: cifar/test.lst          
     inflating: cifar/train.lst         
     inflating: cifar/train.rec         
   [07:38:12] src/io/iter_image_recordio_2.cc:178: ImageRecordIOParser2: 
data/cifar/train.rec, use 4 threads for decoding..
   [07:38:12] src/io/iter_image_recordio_2.cc:178: ImageRecordIOParser2: 
data/cifar/test.rec, use 4 threads for decoding..
   [07:38:12] src/executor/graph_executor.cc:1984: Subgraph backend MKLDNN is 
activated.
   INFO:root:Epoch[0] Batch [0-50]      Speed: 1416.12 samples/sec      
accuracy=0.278799
   INFO:root:Epoch[0] Batch [50-100]    Speed: 1673.98 samples/sec      
accuracy=0.385313
   INFO:root:Epoch[0] Batch [100-150]   Speed: 1624.87 samples/sec      
accuracy=0.424687
   INFO:root:Epoch[0] Batch [150-200]   Speed: 1668.53 samples/sec      
accuracy=0.438750
   INFO:root:Epoch[0] Batch [200-250]   Speed: 1664.30 samples/sec      
accuracy=0.478438
   INFO:root:Epoch[0] Batch [250-300]   Speed: 1696.48 samples/sec      
accuracy=0.511250
   INFO:root:Epoch[0] Batch [300-350]   Speed: 1701.83 samples/sec      
accuracy=0.517188
   INFO:root:Epoch[0] Batch [350-400]   Speed: 1616.46 samples/sec      
accuracy=0.545000
   INFO:root:Epoch[0] Batch [400-450]   Speed: 1697.75 samples/sec      
accuracy=0.556875
   INFO:root:Epoch[0] Batch [450-500]   Speed: 1703.83 samples/sec      
accuracy=0.575625
   INFO:root:Epoch[0] Batch [500-550]   Speed: 1703.13 samples/sec      
accuracy=0.572812
   INFO:root:Epoch[0] Batch [550-600]   Speed: 1699.32 samples/sec      
accuracy=0.587187
   INFO:root:Epoch[0] Batch [600-650]   Speed: 1682.87 samples/sec      
accuracy=0.604688
   INFO:root:Epoch[0] Batch [650-700]   Speed: 1671.12 samples/sec      
accuracy=0.612187
   INFO:root:Epoch[0] Batch [700-750]   Speed: 1705.85 samples/sec      
accuracy=0.611875
   INFO:root:Epoch[0] Train-accuracy=0.516964
   INFO:root:Epoch[0] Time cost=30.561
   INFO:root:Epoch[0] Validation-accuracy=0.628085
   
   attach screenshot:
   ![1 
6_make1](https://user-images.githubusercontent.com/28431214/69811413-fb294880-1228-11ea-86eb-6aac58533845.png)
   ![1 
6_make2](https://user-images.githubusercontent.com/28431214/69811414-fbc1df00-1228-11ea-8b02-a5df49306fee.png)
   ![1 
6_script_build](https://user-images.githubusercontent.com/28431214/69811415-fbc1df00-1228-11ea-8eeb-f86d15ac9a8c.png)
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to