buaalsy2003 opened a new issue #19218:
URL: https://github.com/apache/incubator-mxnet/issues/19218


   ## Description
   I have trained a resnet50 model. When running inference using an early 
checkpoint (0003.params) on CPU, it runs on reasonable speed. However, it is 
much slower to run inference using a later checkpoint (0023.params) on CPU. 
Both models runs very fast at similar speed on GPU. I ran the profile and it 
turns out that the issue is at Conv Layer, where some Conv take up to 110ms. 
   
   Here are the profiling results:
   fast model 0003.params on CPU:
   `Profile Statistics:
        Note the difference in units for different entries.
   Device Storage
   =================
   Name                          Total Count    Min Use  (kB)    Max Use  (kB)  
  Avg Use  (kB)
   ----                          -----------    -------------    -------------  
  -------------
   Memory: cpu/0                         672           0.0000      214184.3125  
    107092.1562
   MXNET_C_API
   =================
   Name                          Total Count        Time (ms)    Min Time (ms)  
  Max Time (ms)    Avg Time (ms)
   ----                          -----------        ---------    -------------  
  -------------    -------------
   MXNDArrayWaitAll                        1        5506.6421        5506.6421  
      5506.6421        5506.6421
   MXExecutorForward                     100          13.2700           0.0760  
         0.3040           0.1327
   MXNet C API Calls                    2301           2.3010           0.0010  
         2.3010           1.1500
   MXImperativeInvokeEx                  100           1.5130           0.0090  
         0.0560           0.0151
   MXNDArrayGetShapeEx                  1200           0.3880           0.0000  
         0.0090           0.0003
   MXNDArrayReshape64                    100           0.1490           0.0010  
         0.0040           0.0015
   MXExecutorOutputs                     100           0.0730           0.0000  
         0.0020           0.0007
   MXNDArrayGetContext                   200           0.0390           0.0000  
         0.0010           0.0002
   MXNDArrayFree                         200           0.0340           0.0000  
         0.0010           0.0002
   MXNDArrayGetDType                     200           0.0250           0.0000  
         0.0010           0.0001
   MXNDArrayGetStorageType               100           0.0150           0.0000  
         0.0010           0.0002
   MXNet C API Concurrency              4602           0.0000           0.0000  
         0.0010           0.0000
   operator
   =================
   Name                          Total Count        Time (ms)    Min Time (ms)  
  Max Time (ms)    Avg Time (ms)
   ----                          -----------        ---------    -------------  
  -------------    -------------
   _sg_mkldnn_conv                      5300        3687.2700           0.0430  
         3.5340           0.6957
   LeakyReLU                            2500         722.8160           0.1400  
         2.5450           0.2891
   _sg_mkldnn_fully_connected             100         194.0460           1.8040 
          2.4230           1.9405
   BatchNorm                            2600         102.9240           0.0120  
         0.2850           0.0396
   Dropout                               100          14.9120           0.1140  
         0.2520           0.1491
   DeleteVariable                        559           6.8570           0.0010  
         1.3360           0.0123
   CopyCPU2CPU                           100           1.5180           0.0130  
         0.0240           0.0152
   _minus_scalar                         100           1.3820           0.0110  
         0.0230           0.0138
   _mul_scalar                           100           0.9680           0.0070  
         0.0260           0.0097`
   
   slow model 0023.params on CPU
   `Profile Statistics:
        Note the difference in units for different entries.
   Device Storage
   =================
   Name                          Total Count    Min Use  (kB)    Max Use  (kB)  
  Avg Use  (kB)
   ----                          -----------    -------------    -------------  
  -------------
   Memory: cpu/0                         672           0.0000      214184.3125  
    107092.1562
   MXNET_C_API
   =================
   Name                          Total Count        Time (ms)    Min Time (ms)  
  Max Time (ms)    Avg Time (ms)
   ----                          -----------        ---------    -------------  
  -------------    -------------
   MXNDArrayWaitAll                        1       78402.4453       78402.4453  
     78402.4453       78402.4453
   MXExecutorForward                     100          15.5860           0.0890  
         0.2150           0.1559
   MXNet C API Calls                    2301           2.3010           0.0010  
         2.3010           1.1500
   MXImperativeInvokeEx                  100           1.5790           0.0090  
         0.0270           0.0158
   MXNDArrayGetShapeEx                  1200           0.5500           0.0000  
         0.0100           0.0005
   MXNDArrayReshape64                    100           0.1820           0.0010  
         0.0090           0.0018
   MXExecutorOutputs                     100           0.0740           0.0000  
         0.0030           0.0007
   MXNDArrayGetContext                   200           0.0540           0.0000  
         0.0010           0.0003
   MXNDArrayGetDType                     200           0.0390           0.0000  
         0.0010           0.0002
   MXNDArrayFree                         200           0.0360           0.0000  
         0.0010           0.0002
   MXNDArrayGetStorageType               100           0.0260           0.0000  
         0.0010           0.0003
   MXNet C API Concurrency              4602           0.0000           0.0000  
         0.0010           0.0000
   operator
   =================
   Name                          Total Count        Time (ms)    Min Time (ms)  
  Max Time (ms)    Avg Time (ms)
   ----                          -----------        ---------    -------------  
  -------------    -------------
   _sg_mkldnn_conv                      5300       68954.4688           0.0450  
       109.0230          13.0103
   LeakyReLU                            2500        1831.9830           0.1510  
         4.5000           0.7328
   _sg_mkldnn_fully_connected             100         186.8240           1.8090 
          2.2790           1.8682
   BatchNorm                            2600         115.5410           0.0120  
         0.8330           0.0444
   Dropout                               100          15.9720           0.1430  
         0.2250           0.1597
   DeleteVariable                        559           6.3650           0.0010  
         1.4840           0.0114
   _minus_scalar                         100           1.4650           0.0120  
         0.0220           0.0147
   CopyCPU2CPU                           100           1.4390           0.0130  
         0.0190           0.0144
   _mul_scalar                           100           0.8480           0.0070  
         0.0150           0.0085`
   
   ### Error Message
   No error
   
   ## To Reproduce
   `    def run_inference():
           out = executor.forward(is_train=False, data=sample)
           pass
       sym, arg_params, aux_params = mx.model.load_checkpoint('model', 23)
       sample = mx.ndarray.zeros((1, 3, 112, 112))
       it = 100
       ctx = mx.cpu()  # mx.cpu()
       executor = sym.simple_bind(ctx=ctx, data=sample.shape, grad_req='null', 
force_rebind=True)
       executor.copy_params_from(arg_params, aux_params)
   
       profiler.set_config(profile_all=True,
                           aggregate_stats=True,
                           filename='profile_output.json')
   
       # warm-up
       for i in range(10):
           run_inference()
   
       # Ask the profiler to start recording
       profiler.set_state('run')
   
       for i in range(it):
           run_inference()
   
       # Make sure all operations have completed
       mx.nd.waitall()
       # Ask the profiler to stop recording
       profiler.set_state('stop')
       # Dump all results to log file before download
       print(profiler.dumps())`
   
   ### Steps to reproduce
   (Paste the commands you ran that produced the error.)
   
   1. run above code with attached 0003.params and 0023.params
   2. compare the output
   
   ## What have you tried to solve it?
   
   1.
   2.
   
   ## Environment
   MXNet 1.4 cu100mkl, MXNet 1.6 cu100mkl, MXNet1.7 native. All versions could 
reproduce the issue. 
   
   
   0003.params, 0023.params and .json are in the following drive:
   
https://drive.google.com/file/d/14MImTcENaLjKZDygdAm5rPSUK-h52lhV/view?usp=sharing
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to