QueensGambit opened a new issue #15640: Performance regression for MXNet 1.5.0 URL: https://github.com/apache/incubator-mxnet/issues/15640 ## Description I experience a performance regression for both CPU (mxnet-mkl) and GPU (mxnet-cu100) when updating from version 1.4.1 to 1.5.0. The speed regression is about 8% on CPU, if you're interested I can also provide a report for GPU. ## Environment info MXNet version 1.4.1 ``` $ pip install mxnet-mkl==1.4.1 ``` and MXNet version 1.5.0 ``` $ pip install mxnet-mkl==1.5.0 ``` * Intel® Core™ i5-8250U CPU @ 1.60GHz × 8 * Ubuntu 18.04.2 LTS * `export MXNET_SUBGRAPH_BACKEND=MKLDNN` ## Profiling results: ``` mx.__version__ ``` '1.4.1' ``` print(profiler.dumps()) ``` ``` Profile Statistics. Note that counter items are counter values and not time units. Device Storage ================= Name Total Count Time (ms) Min Time (ms) Max Time (ms) Avg Time (ms) ---- ----------- --------- ------------- ------------- ------------- Memory: cpu/0 120158 24832.1641 313.3440 24834.2129 12260.4336 MXNET_C_API ================= Name Total Count Time (ms) Min Time (ms) Max Time (ms) Avg Time (ms) ---- ----------- --------- ------------- ------------- ------------- MXNDArrayWaitAll 1 90628.5391 90628.5391 90628.5391 90628.5391 MXExecutorForward 10000 199.3470 0.0150 0.0610 0.0199 MXImperativeInvokeEx 10000 58.7390 0.0030 0.0820 0.0059 MXNDArrayGetShape 40000 21.5920 0.0000 0.0150 0.0005 MXNet C API Concurrency 120002 0.0000 0.0000 0.0010 0.0005 MXNet C API Calls 60001 60.0010 0.0010 60.0010 30.0000 operator ================= Name Total Count Time (ms) Min Time (ms) Max Time (ms) Avg Time (ms) ---- ----------- --------- ------------- ------------- ------------- Reorder 40082 104.3530 0.0000 1.0880 0.0026 [_sg_mkldnn_conv,_sg_mkldnn_conv,_sg_mkldnn_conv,_sg_mkldnn_conv,broadcast_add,_sg_mkldnn_conv,_sg_mkldnn_conv,_sg_mkldnn_conv, 20000 182261.7500 8.3630 70.7680 9.1131 DeleteVariable 240000 636.3090 0.0000 0.0410 0.0027 CopyCPU2CPU 20000 103.6500 0.0020 0.0280 0.0052 ``` ``` mx.__version__ ``` '1.5.0' ``` print(profiler.dumps()) ``` ``` Profile Statistics: Note the difference in units for different entries. Device Storage ================= Name Total Count Min Use (kB) Max Use (kB) Avg Use (kB) ---- ----------- ------------- ------------- ------------- Memory: cpu/0 240440 313.3440 43597.1289 21641.8926 MXNET_C_API ================= Name Total Count Time (ms) Min Time (ms) Max Time (ms) Avg Time (ms) ---- ----------- --------- ------------- ------------- ------------- MXNDArrayWaitAll 2 196511.1406 97689.1719 98821.9688 98255.5703 MXNet C API Calls 220002 220.0020 0.0010 220.0020 110.0000 MXExecutorForward 20000 758.2010 0.0220 0.9980 0.0379 MXImperativeInvokeEx 20000 187.6520 0.0050 0.2250 0.0094 MXExecutorOutputs 20000 27.7760 0.0000 0.0230 0.0014 MXNDArrayGetShapeEx 80000 63.9810 0.0000 0.0480 0.0008 MXNDArrayFree 40000 25.8550 0.0000 0.0160 0.0006 MXNDArrayGetStorageType 40000 23.0740 0.0000 0.0150 0.0006 MXNet C API Concurrency 440004 0.0000 0.0000 0.0010 0.0000 operator ================= Name Total Count Time (ms) Min Time (ms) Max Time (ms) Avg Time (ms) ---- ----------- --------- ------------- ------------- ------------- [_sg_mkldnn_conv,_sg_mkldnn_conv,_sg_mkldnn_conv,_sg_mkldnn_conv,broadcast_add,_sg_mkldnn_conv,_sg_mkldnn_conv,_sg_mkldnn_conv, 40000 396250.1875 9.0350 81.8880 9.9063 CopyCPU2CPU 40000 239.0970 0.0040 0.0550 0.0060 DeleteVariable 480168 1348.4821 0.0000 0.1520 0.0028 Reorder 560000 869.6200 0.0000 11.9040 0.0016 ``` Consequently this results in a lower number of evaluated positions per second (nps) of the engine: #### MXNet-mkl 1.4.1 ``` go movetime 5000 ``` ``` info score cp 71 depth 30 nodes 697 time 4727 nps 147 pv e2e4 d7d5 e4d5 e7e6 f1b5 c7c6 d5c6 b7c6 b5c6 ``` #### MXNet-mkl 1.5.0 ``` go movetime 5000 info score cp 57 depth 25 nodes 673 time 4743 nps 141 pv e2e4 d7d5 e4d5 d8d5 b1c3 d5a5 g1f3 g8f6 f1c4 c8g4 e1g1 e7e6 d2d4 P@a3 a1b1 a3b2 c1b2 P@h3 c4e6 ``` ## Minimum reproducible example Download the latest release **CrazyAra 0.5.1** which contains the model weights and architecture definition in `CrazyAra_0.5.1_RiseV2_mobile/model/`. * https://github.com/QueensGambit/CrazyAra/releases Run the following script respectively on MXNet 1.4.1 and MXNet 1.5.0: ```python import mxnet as mx import numpy as np from mxnet import profiler sym, arg_params, aux_params = mx.model.load_checkpoint('model-1.19246-0.603', 223) sample = mx.ndarray.zeros((1, 34, 8, 8)) it = int(1e4) ctx = mx.gpu() # mx.cpu() executor = sym.simple_bind(ctx=ctx, data=sample.shape, grad_req='null', force_rebind=True) executor.copy_params_from(arg_params, aux_params) def run_inference(): out = executor.forward(is_train=False, data=sample) profiler.set_config(profile_all=True, aggregate_stats=True, filename='profile_output.json') # warm-up for i in range(100): run_inference() # Ask the profiler to start recording profiler.set_state('run') for i in range(it): run_inference() # Make sure all operations have completed mx.nd.waitall() # Ask the profiler to stop recording profiler.set_state('stop') # Dump all results to log file before download profiler.dump() ``` In order to run the engine and check its nps do the following: ``` python crazyara.py uci setoption name context value cpu isready go movetime 5000 ``` More information about the uci-protocoll is described here: * http://wbec-ridderkerk.nl/html/UCIProtocol.html --- MXNet 1.5.0 introduced performance regressions for some operators: * https://docs.google.com/spreadsheets/d/1_eezNWbrBAm3s3i6G1m0Rd3YYdTEnmKlYtn4klqdyN0/edit#gid=196553607 * https://gist.github.com/sandeep-krishnamurthy/e0a2be893c8c4d484390c9c8813bdf50 which might be the cause of this problem. This issue might also be related to https://github.com/apache/incubator-mxnet/issues/15430. Best, ~QueensGambit
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
