wuxun-zhang edited a comment on issue #17159: Performance regression from 1.4.1 to 1.5.1 URL: https://github.com/apache/incubator-mxnet/issues/17159#issuecomment-569933797 > @apeforest > Is this what you wanted with 1.4.1 pip run, 1.5.1 pip run, 1.4.1 source run, 1.5.1 source run. > > **pip mxnet-mkl==1.4.1** > Model: resnext-50 > ----Latency---- > Model Partition Time: 0 > Model Load Time: 87.3405933380127 > First Inference: 113.36421966552734 > p99: 20.80082893371582 > p90: 19.6378231048584 > p50: 15.682220458984375 > Avg: 16.990032211302186 > StdDev: 6.2820178635081145 > > **source mxnet-mkl==1.4.1** > Wasn't able to build, will edit once I figure out was is wrong. > > **pip mxnet-mkl==1.5.1** > Model: resnext-50 > ----Latency---- > Model Partition Time: 0 > Model Load Time: 80.60860633850098 > First Inference: 121.35004997253418 > p99: 37.58740425109863 > p90: 30.339956283569336 > p50: 27.198076248168945 > Avg: 27.73231802529281 > StdDev: 6.622306336227176 > > **source mxnet-mkl==1.5.1** > Model: resnext-50 > ----Latency---- > Model Partition Time: 0 > Model Load Time: 73.66228103637695 > First Inference: 114.89081382751465 > p99: 46.54550552368164 > p90: 30.79986572265625 > p50: 27.560710906982422 > Avg: 28.353479811816488 > StdDev: 10.021275501124565 Hi @jonatan1626 , may I know your exact command to build MXNet with MKLDNN from source (mine is `make USE_OPENCV=1 USE_MKLDNN=1 USE_BLAS=mkl USE_INTEL_PATH=/opt/intel -j` ) ? which version of gcc/g++ you use to build mxnet? I found that there maybe much performance difference when using different gcc verisons on c5.18. The below results are tested on **c5.18xlarge - Deep Learning AMI (Ubuntu 16.04) Version 26.0 (ami-025ed45832b817a35)** . gcc version: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609 I compared the performance of several primitives for MXNet: pip-installed, source built with gcc 4.8 and source built with gcc 5.4. Looks that the performance of `batch_norm` primitive (ref:any pass) will be significantly affected by gcc/g++ versions. Previously, I also mentioned that the performance regression only happens when running gluon hybridized models (no op fusion by default). Regarding symbolic models, `batch_norm` will be fused with `conv` automatically ( `MXNET_SUBGRAPH_BACKEND=MKLDNN`) and no need to call `batch_norm` primitive. mkldnn primitives (ms) | gcc 4.8.5 (pip) | gcc 4.8.5 (source, apt-get install gcc-4.8) | gcc 5.4.0 (source, default) -- | -- | -- | -- Conv | 5705.7 | 5672.18 | 5636.06 inner product | 100.172 | 106.25 | 101.991 **batch_norm** | **11484.1** | **11757.1** | **5065.54** softmax | 9.24 | 11.82 | 11.94 reorder | 305.375| 306.86 | 308.302 Could you help double check if my results are reasonable from your side? You can use this [script](https://gist.github.com/wuxun-zhang/8e1bc466fd08bc78561fd1620bfe58c3#file-autorun_with_mkldnn_verbose-sh) to do profiling on MKLDNN primitives (no need other patches), and the you can get the results like above. Please ping me freely if you have any question. Thanks.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
