wuxun-zhang edited a comment on issue #17159: Performance regression from 1.4.1 
to 1.5.1
URL: 
https://github.com/apache/incubator-mxnet/issues/17159#issuecomment-569933797
 
 
   > @apeforest
   > Is this what you wanted with 1.4.1 pip run, 1.5.1 pip run, 1.4.1 source 
run, 1.5.1 source run.
   > 
   > **pip mxnet-mkl==1.4.1**
   > Model: resnext-50
   > ----Latency----
   > Model Partition Time: 0
   > Model Load Time: 87.3405933380127
   > First Inference: 113.36421966552734
   > p99: 20.80082893371582
   > p90: 19.6378231048584
   > p50: 15.682220458984375
   > Avg: 16.990032211302186
   > StdDev: 6.2820178635081145
   > 
   > **source mxnet-mkl==1.4.1**
   > Wasn't able to build, will edit once I figure out was is wrong.
   > 
   > **pip mxnet-mkl==1.5.1**
   > Model: resnext-50
   > ----Latency----
   > Model Partition Time: 0
   > Model Load Time: 80.60860633850098
   > First Inference: 121.35004997253418
   > p99: 37.58740425109863
   > p90: 30.339956283569336
   > p50: 27.198076248168945
   > Avg: 27.73231802529281
   > StdDev: 6.622306336227176
   > 
   > **source mxnet-mkl==1.5.1**
   > Model: resnext-50
   > ----Latency----
   > Model Partition Time: 0
   > Model Load Time: 73.66228103637695
   > First Inference: 114.89081382751465
   > p99: 46.54550552368164
   > p90: 30.79986572265625
   > p50: 27.560710906982422
   > Avg: 28.353479811816488
   > StdDev: 10.021275501124565
   
   Hi @jonatan1626 , may I know your exact command to build MXNet with MKLDNN 
from source (mine is `make USE_OPENCV=1 USE_MKLDNN=1 USE_BLAS=mkl 
USE_INTEL_PATH=/opt/intel -j` ) ? which version of gcc/g++ you use to build 
mxnet? I found that there maybe much performance difference when using 
different gcc verisons on c5.18. 
   
   The below results are tested on **c5.18xlarge - Deep Learning AMI (Ubuntu 
16.04) Version 26.0   (ami-025ed45832b817a35)** .
   
   gcc version: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
   
   I compared the performance of several primitives for MXNet: pip-installed, 
source built with gcc 4.8 and source built with gcc 5.4. Looks that the 
performance of `batch_norm` primitive (ref:any pass) will be significantly 
affected by gcc/g++ versions. Previously, I also mentioned that the performance 
regression only happens when running gluon hybridized models (no op fusion by 
default). Regarding symbolic models, `batch_norm` will be fused with `conv` 
automatically ( `MXNET_SUBGRAPH_BACKEND=MKLDNN`) and no need to call 
`batch_norm` primitive.
   
   mkldnn primitives (ms) | gcc 4.8.5 (pip) | gcc 4.8.5 (source, apt-get 
install gcc-4.8) | gcc 5.4.0 (source, default)
   -- | -- | -- | --
   Conv | 5705.7 | 5672.18 | 5636.06
   inner product | 100.172 | 106.25 | 101.991
   **batch_norm** | **11484.1** | **11757.1** | **5065.54**
   softmax | 9.24 | 11.82 | 11.94
   reorder | 305.375| 306.86 | 308.302
   
   Could you help double check if my results are reasonable from your side? You 
can use this 
[script](https://gist.github.com/wuxun-zhang/8e1bc466fd08bc78561fd1620bfe58c3#file-autorun_with_mkldnn_verbose-sh)
 to do profiling on MKLDNN primitives (no need other patches), and the you can 
get the results like above. Please ping me freely if you have any question. 
Thanks.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to