ElaineBao commented on issue #15640: Performance regression for MXNet 1.5.0 URL: https://github.com/apache/incubator-mxnet/issues/15640#issuecomment-514631220 Hi, all, I've reproduced the case and confirmed initially that there is a regression in `Activation` operation for specific input shape. Here is the detailed report: (1) I run the code provided by @QueensGambit with `MKLDNN` backend (change `it`(iterations) in above script from 1e4 to 1e3, for efficiency), and got the profile statistics: ### mxnet 1.4.1 ``` Profile Statistics. Note that counter items are counter values and not time units. Device Storage ================= Name Total Count Time (ms) Min Time (ms) Max Time (ms) Avg Time (ms) ---- ----------- --------- ------------- ------------- ------------- Memory: cpu/0 126343 13.5680 0.0000 905.4720 452.7360 MXNET_C_API ================= Name Total Count Time (ms) Min Time (ms) Max Time (ms) Avg Time (ms) ---- ----------- --------- ------------- ------------- ------------- MXInvokeCachedOpEx 1000 0.6340 0.0000 0.0020 0.0006 MXNDArrayFree 2000 1.2910 0.0000 0.0030 0.0006 MXInvokeCachedOp 1000 1432.7380 1.0730 3.3030 1.4327 MXNDArrayWaitAll 1 21019.3457 21019.3457 21019.3457 21019.3457 MXNDArrayGetContext 1000 0.7310 0.0000 0.0100 0.0007 MXNet C API Concurrency 10002 0.0000 0.0000 0.0010 0.0005 MXNet C API Calls 5001 5.0010 0.0010 5.0010 2.5000 operator ================= Name Total Count Time (ms) Min Time (ms) Max Time (ms) Avg Time (ms) ---- ----------- --------- ------------- ------------- ------------- FullyConnected 24000 4591.6479 0.0310 0.5470 0.1913 Flatten 14000 1809.8690 0.0090 0.2310 0.1293 broadcast_add 26000 15939.2910 0.4090 1.7990 0.6130 BatchNorm 84000 3556.9929 0.0190 0.9100 0.0423 broadcast_mul 10000 2011.7510 0.1820 0.8450 0.2012 Reshape 10000 38.4070 0.0020 0.0260 0.0038 Pooling 10000 344.1310 0.0180 0.6350 0.0344 Convolution 86000 7773.5220 0.0260 1.1880 0.0904 Reorder 4716 11.4720 0.0010 0.0150 0.0024 Activation 82000 2387.3359 0.0140 0.6090 0.0291 DeleteVariable 141422 598.9450 0.0000 0.5270 0.0042 ``` ### mxnet 1.5.0 ``` Profile Statistics: Note the difference in units for different entries. Device Storage ================= Name Total Count Min Use (kB) Max Use (kB) Avg Use (kB) ---- ----------- ------------- ------------- ------------- Memory: cpu/0 127053 0.0000 891.9040 445.9520 MXNET_C_API ================= Name Total Count Time (ms) Min Time (ms) Max Time (ms) Avg Time (ms) ---- ----------- --------- ------------- ------------- ------------- MXNDArrayWaitAll 1 23634.1523 23634.1523 23634.1523 23634.1523 MXNet C API Calls 5001 5.0010 0.0010 5.0010 2.5000 MXInvokeCachedOp 1000 1632.0590 1.0620 2.6950 1.6321 MXNDArrayGetContext 1000 0.7340 0.0000 0.0120 0.0007 MXNDArrayFree 2000 1.4440 0.0000 0.0100 0.0007 MXInvokeCachedOpEx 1000 0.7130 0.0000 0.0020 0.0007 MXNet C API Concurrency 10002 0.0000 0.0000 0.0010 0.0000 operator ================= Name Total Count Time (ms) Min Time (ms) Max Time (ms) Avg Time (ms) ---- ----------- --------- ------------- ------------- ------------- broadcast_add 26000 17322.6543 0.4180 1.9220 0.6663 broadcast_mul 10000 2184.6421 0.1850 0.7510 0.2185 FullyConnected 24000 4494.6572 0.0310 0.8050 0.1873 Flatten 14000 1890.4550 0.0090 0.2690 0.1350 Convolution 86000 7898.2749 0.0260 1.3600 0.0918 Activation 82000 4982.0112 0.0170 0.7210 0.0608 BatchNorm 84000 3896.8450 0.0210 0.7810 0.0464 Pooling 10000 315.3780 0.0170 0.6760 0.0315 Reshape 10000 68.2620 0.0040 0.0230 0.0068 DeleteVariable 142202 659.8880 0.0000 0.0680 0.0046 Reorder 32420 53.1210 0.0000 0.0180 0.0016 ``` Noticed from the table that `Activation` operation has a performance gap between 1.4.1 (avg time: 0.0291 ms) and 1.5.0 (avg time: 0.0608 ms). The performance of other operations are considered as normal fluctuation, concerning that the profiler code changed between 1.4.1 and 1.5.0. (2) to further investigate the performance regression of `Activation` operation, I enabled MKLDNN verbose by `export MKLDNN_VERBOSE=1`, which helps to collect basic statistics like execution time and input/output shapes of each MKLDNN operation. Parsing the verbose result and computing the average time of operation with different input shapes: ### mxnet 1.4.1 ``` op: relu, shape:all, total_count:38500, total_time:386.78921302000026, avg_time:0.010046473065454552 shape:mb1ic128, total_count:5500, total_time:43.13965790000107, avg_time:0.007843574163636558 shape:mb1ic256ih8iw8, total_count:4400, total_time:41.59277998999979, avg_time:0.00945290454318177 shape:mb1ic128ih8iw8, total_count:2200, total_time:19.7434104399999, avg_time:0.008974277472727226 ...... op: sigmoid, shape:all, total_count:5500, total_time:56.511490069998786, avg_time:0.010274816376363415 shape:mb1ic256, total_count:5500, total_time:56.511490069998786, avg_time:0.010274816376363415 ``` ### mxnet 1.5.0 ``` op: relu, shape: all, total_count:38500, total_time:1070.7652004499996, avg_time:0.02781208312857142 shape:mb1ic128, total_count:5500, total_time:690.7533760000006, avg_time:0.125591522909091 shape:mb1ic256ih8iw8, total_count:4400, total_time:38.590089899999874, avg_time:0.008770474977272699 shape:mb1ic128ih8iw8, total_count:2200, total_time:18.308107539999845, avg_time:0.008321867063636293 ...... op:sigmoid, shape: all, total_count:5500, total_time:680.288773400003, avg_time:0.12368886789090965 shape:mb1ic256, total_count:5500, total_time:680.288773400003, avg_time:0.12368886789090965 ``` Noticed from the table that for `relu`, shape of `mb1ic128` (stands for batch size=1, input channel=128) has a performance regression (avg_time from 0.0078 to 0.1256), other shapes are almost the same between 1.4.1 and 1.5.0. And for `sigmoid`, shape of `mb1ic256` has a performance regression (avg_time from 0.0103 to 0.1237). On the other hand, as @QueensGambit mentioned that there's not a regression for all models, I think the model in CrazyAraFish_0.5.0_RiseV1.zip may have fewer these operations, so that the performance can be explained as above. In conclusion, MKLDNN `relu` with shape `mb1ic128` and `sigmoid` with shape `mb1ic256` have a performance regression from 1.4.1 to 1.5.0. We'll look into it.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
