QueensGambit opened a new issue #15640: Performance regression for MXNet 1.5.0
URL: https://github.com/apache/incubator-mxnet/issues/15640
 
 
   ## Description
   I experience a performance regression for both CPU (mxnet-mkl) and GPU 
(mxnet-cu100) when updating from version 1.4.1 to 1.5.0. The speed regression 
is about 8% on CPU, if you're interested I can also provide a report for GPU.
   
   ## Environment info
   
   MXNet version 1.4.1
   ```
   $ pip install mxnet-mkl==1.4.1
   ```
   and MXNet version 1.5.0
   ```
   $ pip install mxnet-mkl==1.5.0
   ```
   
   * Intel® Core™ i5-8250U CPU @ 1.60GHz × 8 
   * Ubuntu 18.04.2 LTS
   * `export MXNET_SUBGRAPH_BACKEND=MKLDNN`
   
   ## Profiling results:
   
   ```
   mx.__version__
   ```
   '1.4.1'
   ```
   print(profiler.dumps())
   ```
   
   ```
   Profile Statistics.
        Note that counter items are counter values and not time units.
   Device Storage
   =================
   Name                          Total Count        Time (ms)    Min Time (ms)  
  Max Time (ms)    Avg Time (ms)
   ----                          -----------        ---------    -------------  
  -------------    -------------
   Memory: cpu/0                      120158       24832.1641         313.3440  
     24834.2129       12260.4336
   
   MXNET_C_API
   =================
   Name                          Total Count        Time (ms)    Min Time (ms)  
  Max Time (ms)    Avg Time (ms)
   ----                          -----------        ---------    -------------  
  -------------    -------------
   MXNDArrayWaitAll                        1       90628.5391       90628.5391  
     90628.5391       90628.5391
   MXExecutorForward                   10000         199.3470           0.0150  
         0.0610           0.0199
   MXImperativeInvokeEx                10000          58.7390           0.0030  
         0.0820           0.0059
   MXNDArrayGetShape                   40000          21.5920           0.0000  
         0.0150           0.0005
   MXNet C API Concurrency            120002           0.0000           0.0000  
         0.0010           0.0005
   MXNet C API Calls                   60001          60.0010           0.0010  
        60.0010          30.0000
   
   operator
   =================
   Name                          Total Count        Time (ms)    Min Time (ms)  
  Max Time (ms)    Avg Time (ms)
   ----                          -----------        ---------    -------------  
  -------------    -------------
   Reorder                             40082         104.3530           0.0000  
         1.0880           0.0026
   
[_sg_mkldnn_conv,_sg_mkldnn_conv,_sg_mkldnn_conv,_sg_mkldnn_conv,broadcast_add,_sg_mkldnn_conv,_sg_mkldnn_conv,_sg_mkldnn_conv,
           20000      182261.7500           8.3630          70.7680           
9.1131
   DeleteVariable                     240000         636.3090           0.0000  
         0.0410           0.0027
   CopyCPU2CPU                         20000         103.6500           0.0020  
         0.0280           0.0052
   ```
   
   
   ```
   mx.__version__
   ```
   
   '1.5.0'
   
   ```
   print(profiler.dumps())
   ```
   
   ```
   Profile Statistics:
        Note the difference in units for different entries.
   Device Storage
   =================
   Name                          Total Count    Min Use  (kB)    Max Use  (kB)  
  Avg Use  (kB)
   ----                          -----------    -------------    -------------  
  -------------
   Memory: cpu/0                      240440         313.3440       43597.1289  
     21641.8926
   
   MXNET_C_API
   =================
   Name                          Total Count        Time (ms)    Min Time (ms)  
  Max Time (ms)    Avg Time (ms)
   ----                          -----------        ---------    -------------  
  -------------    -------------
   MXNDArrayWaitAll                        2      196511.1406       97689.1719  
     98821.9688       98255.5703
   MXNet C API Calls                  220002         220.0020           0.0010  
       220.0020         110.0000
   MXExecutorForward                   20000         758.2010           0.0220  
         0.9980           0.0379
   MXImperativeInvokeEx                20000         187.6520           0.0050  
         0.2250           0.0094
   MXExecutorOutputs                   20000          27.7760           0.0000  
         0.0230           0.0014
   MXNDArrayGetShapeEx                 80000          63.9810           0.0000  
         0.0480           0.0008
   MXNDArrayFree                       40000          25.8550           0.0000  
         0.0160           0.0006
   MXNDArrayGetStorageType             40000          23.0740           0.0000  
         0.0150           0.0006
   MXNet C API Concurrency            440004           0.0000           0.0000  
         0.0010           0.0000
   
   operator
   =================
   Name                          Total Count        Time (ms)    Min Time (ms)  
  Max Time (ms)    Avg Time (ms)
   ----                          -----------        ---------    -------------  
  -------------    -------------
   
[_sg_mkldnn_conv,_sg_mkldnn_conv,_sg_mkldnn_conv,_sg_mkldnn_conv,broadcast_add,_sg_mkldnn_conv,_sg_mkldnn_conv,_sg_mkldnn_conv,
           40000      396250.1875           9.0350          81.8880           
9.9063
   CopyCPU2CPU                         40000         239.0970           0.0040  
         0.0550           0.0060
   DeleteVariable                     480168        1348.4821           0.0000  
         0.1520           0.0028
   Reorder                            560000         869.6200           0.0000  
        11.9040           0.0016
   ```
   
   Consequently this results in a lower number of evaluated positions per 
second (nps) of the engine:
   
   #### MXNet-mkl 1.4.1
   
   ```
   go movetime 5000
   ```
   ```
   info score cp 71 depth 30 nodes 697 time 4727 nps 147 pv e2e4 d7d5 e4d5 e7e6 
f1b5 c7c6 d5c6 b7c6 b5c6
   ```
   
   
   #### MXNet-mkl 1.5.0
   
   ```
   go movetime 5000
   
   info score cp 57 depth 25 nodes 673 time 4743 nps 141 pv e2e4 d7d5 e4d5 d8d5 
b1c3 d5a5 g1f3 g8f6 f1c4 c8g4 e1g1 e7e6 d2d4 P@a3 a1b1 a3b2 c1b2 P@h3 c4e6
   ```
   
   ## Minimum reproducible example
   
   Download the latest release **CrazyAra 0.5.1** which contains the model 
weights and architecture definition in `CrazyAra_0.5.1_RiseV2_mobile/model/`.
   * https://github.com/QueensGambit/CrazyAra/releases
   
   Run the following script respectively on MXNet 1.4.1 and MXNet 1.5.0:
   
   ```python
   import mxnet as mx
   import numpy as np
   from mxnet import profiler
   
   sym, arg_params, aux_params = 
mx.model.load_checkpoint('model-1.19246-0.603', 223)
   sample = mx.ndarray.zeros((1, 34, 8, 8))
   it = int(1e4)
   ctx = mx.gpu() # mx.cpu()
   executor = sym.simple_bind(ctx=ctx, data=sample.shape, grad_req='null', 
force_rebind=True)
   executor.copy_params_from(arg_params, aux_params)
   
   def run_inference():
       out = executor.forward(is_train=False, data=sample)
   
   profiler.set_config(profile_all=True,
                       aggregate_stats=True,
                       filename='profile_output.json')
   
   # warm-up
   for i in range(100):
       run_inference()
       
   # Ask the profiler to start recording
   profiler.set_state('run')
   
   for i in range(it):
       run_inference()
   
   # Make sure all operations have completed
   mx.nd.waitall()
   # Ask the profiler to stop recording
   profiler.set_state('stop')
   # Dump all results to log file before download
   profiler.dump()
   ```
   
   In order to run the engine and check its nps do the following:
   ```
   python crazyara.py
   uci
   setoption name context value cpu
   isready
   go movetime 5000
   ```
   More information about the uci-protocoll is described here:
   * http://wbec-ridderkerk.nl/html/UCIProtocol.html
   
   ---
   MXNet 1.5.0 introduced performance regressions for some operators:
   * 
https://docs.google.com/spreadsheets/d/1_eezNWbrBAm3s3i6G1m0Rd3YYdTEnmKlYtn4klqdyN0/edit#gid=196553607
   * 
https://gist.github.com/sandeep-krishnamurthy/e0a2be893c8c4d484390c9c8813bdf50
   
   which might be the cause of this problem.
   
   This issue might also be related to 
https://github.com/apache/incubator-mxnet/issues/15430.
   
   Best,
   ~QueensGambit
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to