sewardto opened a new issue #4972: Performance regression of quantization on 
CUDA after [Relay][AutoTVM] Relay op strategy (#4644) 
URL: https://github.com/apache/incubator-tvm/issues/4972
 
 
   My environment:
   ```
   Linux ziran-pc 5.5.6-1-MANJARO #1 SMP Mon Feb 24 09:24:51 UTC 2020 x86_64 
GNU/Linux
   CUDA Version: 10.2
   Python 3.8.1
   gcc (Arch Linux 9.2.1+20200130-2) 9.2.1 20200130
   ```
   
   Here is my code, which uses **resnet18v1 onnx** model.
   ```
   resnetv1 = onnx.load('models/resnet18v1.onnx')
   input_blob = resnetv1.graph.input[0]
   input_shape = tuple(map(lambda x: getattr(x, 'dim_value'), 
input_blob.type.tensor_type.shape.dim))
   shape_dict = {input_blob.name: input_shape}
   mod_resnetv1, params_resnetv1 = relay.frontend.from_onnx(resnetv1, 
shape_dict)
   
   mod_q_resnetv1 = quantize(mod_resnetv1, params_resnetv1)
   
   graph, mod, params = relay.build_module.build(mod_q_resnetv1, target='cuda', 
params=params_resnetv1)
   
   val_data = get_val_data()
   for i, batch in enumerate(val_data):
       if i > 0:
           break
       data, categories = batch['data'], batch['label']
       m = debug_runtime.create(graph, mod, ctx, dump_root='tvmdbg')
       m.set_input('data', tvm.nd.array(data.astype('float32')))
       m.run()
       tvm_out = m.get_output(0, tvm.nd.empty(tuple([1, 1000]), 
'float32')).asnumpy() 
   ```
   
   Output when TVM is at ([Fix] Fix get_valid_count flaky test for cuda 
(#4901)):
   
   ```
   [22:55:22] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:92:
 Iteration: 0
   [22:55:22] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #0 fused_nn_conv2d_multiply_add_nn_relu: 1685.52 us/iter
   [22:55:22] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #1 fused_nn_max_pool2d_1: 32.843 us/iter
   [22:55:22] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #2 fused_multiply_round_clip_cast: 13.9443 us/iter
   [22:55:22] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #3 
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036_:
 320.88 us/iter
   [22:55:22] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #4 
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__3:
 321.255 us/iter
   [22:55:22] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #5 
fused_multiply_round_clip_cast_cast_left_shift_multiply_add_right_shift_cast_add_2320814265661055830_:
 16.196 us/iter
   [22:55:22] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #6 fused_cast_25: 12.0867 us/iter
   [22:55:22] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #7 
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__1:
 319.658 us/iter
   [22:55:22] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #8 
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__4:
 322.954 us/iter
   [22:55:22] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #9 
fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948__2:
 15.1093 us/iter
   [22:55:22] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #10 
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__2:
 63.3707 us/iter
   [22:55:22] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #11 
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__2:
 482.38 us/iter
   [22:55:22] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #12 
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__5:
 508.352 us/iter
   [22:55:22] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #13 
fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089__2:
 12.5682 us/iter
   [22:55:22] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #14 fused_cast_24: 10.7158 us/iter
   [22:55:22] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #15 
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__3:
 506.871 us/iter
   [22:55:22] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #16 
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__6:
 510.042 us/iter
   [22:55:22] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #17 
fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948__1:
 12.363 us/iter
   [22:55:22] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #18 
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__1:
 77.2029 us/iter
   [22:55:22] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #19 
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__4:
 691.62 us/iter
   [22:55:22] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #20 
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__7:
 532.286 us/iter
   [22:55:22] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #21 
fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089__1:
 10.7689 us/iter
   [22:55:22] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #22 fused_cast_23: 9.9673 us/iter
   [22:55:22] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #23 
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__5:
 538.167 us/iter
   [22:55:22] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #24 
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__8:
 540.056 us/iter
   [22:55:22] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #25 
fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948_:
 11.4951 us/iter
   [22:55:22] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #26 
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588_:
 104.663 us/iter
   [22:55:22] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #27 
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__6:
 962.534 us/iter
   [22:55:22] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #28 
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__9:
 1023.26 us/iter
   [22:55:22] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #29 
fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089_:
 9.9758 us/iter
   [22:55:22] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #30 fused_cast_22: 9.3292 us/iter
   [22:55:22] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #31 
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__7:
 1025.56 us/iter
   [22:55:22] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #32 
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__10:
 1024.85 us/iter
   [22:55:22] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #33 
fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast: 
10.0607 us/iter
   [22:55:22] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #34 fused_nn_global_avg_pool2d_cast_multiply: 12.0975 us/iter
   [22:55:22] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #35 fused_nn_batch_flatten_nn_batch_flatten_multiply: 9.2545 us/iter
   [22:55:22] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #36 fused_nn_dense_nn_bias_add: 21.2773 us/iter
   Node Name                                                                    
                              Ops                                               
                                                         Time(us)   Time(%)  
Shape              Inputs  Outputs
   ---------                                                                    
                              ---                                               
                                                         --------   -------  
-----              ------  -------
   fused_nn_conv2d_multiply_add_nn_relu                                         
                              fused_nn_conv2d_multiply_add_nn_relu              
                                                         1685.52    14.294   
(1, 64, 112, 112)  4       1
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__7
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__7
   1025.56    8.697    (1, 512, 7, 7)     4       1
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__10
  
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__10
  1024.85    8.691    (1, 512, 7, 7)     4       1
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__9
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__9
   1023.26    8.678    (1, 512, 7, 7)     4       1
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__6
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__6
   962.534    8.163    (1, 512, 7, 7)     4       1
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__4
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__4
   691.62     5.865    (1, 256, 14, 14)   4       1
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__8
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__8
   540.056    4.58     (1, 256, 14, 14)   4       1
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__5
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__5
   538.167    4.564    (1, 256, 14, 14)   4       1
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__7
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__7
   532.286    4.514    (1, 256, 14, 14)   4       1
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__6
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__6
   510.042    4.325    (1, 128, 28, 28)   4       1
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__5
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__5
   508.352    4.311    (1, 128, 28, 28)   4       1
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__3
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__3
   506.871    4.299    (1, 128, 28, 28)   4       1
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__2
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__2
   482.38     4.091    (1, 128, 28, 28)   4       1
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__4
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__4
   322.954    2.739    (1, 64, 56, 56)    4       1
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__3
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__3
   321.255    2.724    (1, 64, 56, 56)    4       1
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036_
     
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036_
     320.88     2.721    (1, 64, 56, 56)    4       1
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__1
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__1
   319.658    2.711    (1, 64, 56, 56)    4       1
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588_
     
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588_
     104.663    0.888    (1, 512, 7, 7)     4       1
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__1
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__1
   77.203     0.655    (1, 256, 14, 14)   4       1
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__2
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__2
   63.371     0.537    (1, 128, 28, 28)   4       1
   fused_nn_max_pool2d_1                                                        
                              fused_nn_max_pool2d_1                             
                                                         32.843     0.279    
(1, 64, 56, 56)    1       1
   fused_nn_dense_nn_bias_add                                                   
                              fused_nn_dense_nn_bias_add                        
                                                         21.277     0.18     
(1, 1000)          3       1
   
fused_multiply_round_clip_cast_cast_left_shift_multiply_add_right_shift_cast_add_2320814265661055830_
      
fused_multiply_round_clip_cast_cast_left_shift_multiply_add_right_shift_cast_add_2320814265661055830_
      16.196     0.137    (1, 64, 56, 56)    2       1
   
fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948__2
    
fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948__2
    15.109     0.128    (1, 64, 56, 56)    2       1
   fused_multiply_round_clip_cast                                               
                              fused_multiply_round_clip_cast                    
                                                         13.944     0.118    
(1, 64, 56, 56)    1       1
   
fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089__2
   
fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089__2
   12.568     0.107    (1, 128, 28, 28)   2       1
   
fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948__1
    
fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948__1
    12.363     0.105    (1, 128, 28, 28)   2       1
   fused_nn_global_avg_pool2d_cast_multiply                                     
                              fused_nn_global_avg_pool2d_cast_multiply          
                                                         12.097     0.103    
(1, 512, 1, 1)     1       1
   fused_cast_25                                                                
                              fused_cast_25                                     
                                                         12.087     0.103    
(1, 64, 56, 56)    1       1
   
fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948_
      
fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948_
      11.495     0.097    (1, 256, 14, 14)   2       1
   
fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089__1
   
fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089__1
   10.769     0.091    (1, 256, 14, 14)   2       1
   fused_cast_24                                                                
                              fused_cast_24                                     
                                                         10.716     0.091    
(1, 128, 28, 28)   1       1
   fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast    
                              
fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast       
                           10.061     0.085    (1, 512, 7, 7)     2       1
   
fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089_
     
fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089_
     9.976      0.085    (1, 512, 7, 7)     2       1
   fused_cast_23                                                                
                              fused_cast_23                                     
                                                         9.967      0.085    
(1, 256, 14, 14)   1       1
   fused_cast_22                                                                
                              fused_cast_22                                     
                                                         9.329      0.079    
(1, 512, 7, 7)     1       1
   fused_nn_batch_flatten_nn_batch_flatten_multiply                             
                              fused_nn_batch_flatten_nn_batch_flatten_multiply  
                                                         9.254      0.078    
(1, 512)           1       1
   Total_time                                                                   
                              -                                                 
                                                         11791.534  -        -  
                -       -
   ```
   
   Output when TVM is at ([Relay][AutoTVM] Relay op strategy (#4644)):
   ```
   [22:43:06] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:92:
 Iteration: 0
   [22:43:06] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #0 fused_nn_conv2d_multiply_add_nn_relu: 4584.26 us/iter
   [22:43:06] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #1 fused_nn_max_pool2d_1: 30.2865 us/iter
   [22:43:06] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #2 fused_multiply_round_clip_cast: 14.6314 us/iter
   [22:43:06] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #3 
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036_:
 5281.79 us/iter
   [22:43:06] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #4 
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__3:
 5251.26 us/iter
   [22:43:06] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #5 
fused_multiply_round_clip_cast_cast_left_shift_multiply_add_right_shift_cast_add_2320814265661055830_:
 19.2247 us/iter
   [22:43:06] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #6 fused_cast_25: 12.4631 us/iter
   [22:43:06] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #7 
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__1:
 5161.39 us/iter
   [22:43:06] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #8 
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__4:
 5320.71 us/iter
   [22:43:06] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #9 
fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948__2:
 107.187 us/iter
   [22:43:06] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #10 
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__2:
 59.8113 us/iter
   [22:43:06] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #11 
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__2:
 426.696 us/iter
   [22:43:06] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #12 
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__5:
 9036.95 us/iter
   [22:43:06] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #13 
fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089__2:
 18.7588 us/iter
   [22:43:06] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #14 fused_cast_24: 13.5717 us/iter
   [22:43:06] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #15 
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__3:
 9323.67 us/iter
   [22:43:06] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #16 
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__6:
 9690.43 us/iter
   [22:43:06] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #17 
fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948__1:
 76.843 us/iter
   [22:43:06] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #18 
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__1:
 70.4272 us/iter
   [22:43:06] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #19 
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__4:
 596.825 us/iter
   [22:43:06] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #20 
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__7:
 9047.68 us/iter
   [22:43:06] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #21 
fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089__1:
 56.8034 us/iter
   [22:43:06] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #22 fused_cast_23: 10.0938 us/iter
   [22:43:06] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #23 
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__5:
 8854.5 us/iter
   [22:43:06] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #24 
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__8:
 9212.74 us/iter
   [22:43:06] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #25 
fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948_:
 14.1323 us/iter
   [22:43:06] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #26 
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588_:
 93.6364 us/iter
   [22:43:06] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #27 
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__6:
 843.468 us/iter
   [22:43:06] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #28 
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__9:
 11918 us/iter
   [22:43:06] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #29 
fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089_:
 56.1085 us/iter
   [22:43:06] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #30 fused_cast_22: 10.012 us/iter
   [22:43:06] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #31 
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__7:
 11729.8 us/iter
   [22:43:06] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #32 
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__10:
 12051.1 us/iter
   [22:43:06] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #33 
fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast: 
38.601 us/iter
   [22:43:06] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #34 fused_nn_global_avg_pool2d_cast_multiply: 22.1764 us/iter
   [22:43:06] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #35 fused_nn_batch_flatten_nn_batch_flatten_multiply: 9.9415 us/iter
   [22:43:06] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #36 fused_nn_dense_nn_bias_add: 22.5578 us/iter
   Node Name                                                                    
                              Ops                                               
                                                         Time(us)    Time(%)  
Shape              Inputs  Outputs
   ---------                                                                    
                              ---                                               
                                                         --------    -------  
-----              ------  -------
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__10
  
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__10
  12051.1     10.119   (1, 512, 7, 7)     4       1
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__9
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__9
   11918.0     10.008   (1, 512, 7, 7)     4       1
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__7
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__7
   11729.8     9.85     (1, 512, 7, 7)     4       1
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__6
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__6
   9690.43     8.137    (1, 128, 28, 28)   4       1
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__3
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__3
   9323.67     7.829    (1, 128, 28, 28)   4       1
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__8
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__8
   9212.74     7.736    (1, 256, 14, 14)   4       1
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__7
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__7
   9047.68     7.597    (1, 256, 14, 14)   4       1
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__5
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__5
   9036.95     7.588    (1, 128, 28, 28)   4       1
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__5
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__5
   8854.5      7.435    (1, 256, 14, 14)   4       1
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__4
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__4
   5320.71     4.468    (1, 64, 56, 56)    4       1
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036_
     
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036_
     5281.79     4.435    (1, 64, 56, 56)    4       1
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__3
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__3
   5251.26     4.41     (1, 64, 56, 56)    4       1
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__1
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__1
   5161.39     4.334    (1, 64, 56, 56)    4       1
   fused_nn_conv2d_multiply_add_nn_relu                                         
                              fused_nn_conv2d_multiply_add_nn_relu              
                                                         4584.26     3.849    
(1, 64, 112, 112)  4       1
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__6
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__6
   843.468     0.708    (1, 512, 7, 7)     4       1
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__4
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__4
   596.825     0.501    (1, 256, 14, 14)   4       1
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__2
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__2
   426.696     0.358    (1, 128, 28, 28)   4       1
   
fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948__2
    
fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948__2
    107.187     0.09     (1, 64, 56, 56)    2       1
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588_
     
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588_
     93.636      0.079    (1, 512, 7, 7)     4       1
   
fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948__1
    
fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948__1
    76.843      0.065    (1, 128, 28, 28)   2       1
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__1
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__1
   70.427      0.059    (1, 256, 14, 14)   4       1
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__2
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__2
   59.811      0.05     (1, 128, 28, 28)   4       1
   
fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089__1
   
fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089__1
   56.803      0.048    (1, 256, 14, 14)   2       1
   
fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089_
     
fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089_
     56.108      0.047    (1, 512, 7, 7)     2       1
   fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast    
                              
fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast       
                           38.601      0.032    (1, 512, 7, 7)     2       1
   fused_nn_max_pool2d_1                                                        
                              fused_nn_max_pool2d_1                             
                                                         30.287      0.025    
(1, 64, 56, 56)    1       1
   fused_nn_dense_nn_bias_add                                                   
                              fused_nn_dense_nn_bias_add                        
                                                         22.558      0.019    
(1, 1000)          3       1
   fused_nn_global_avg_pool2d_cast_multiply                                     
                              fused_nn_global_avg_pool2d_cast_multiply          
                                                         22.176      0.019    
(1, 512, 1, 1)     1       1
   
fused_multiply_round_clip_cast_cast_left_shift_multiply_add_right_shift_cast_add_2320814265661055830_
      
fused_multiply_round_clip_cast_cast_left_shift_multiply_add_right_shift_cast_add_2320814265661055830_
      19.225      0.016    (1, 64, 56, 56)    2       1
   
fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089__2
   
fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089__2
   18.759      0.016    (1, 128, 28, 28)   2       1
   fused_multiply_round_clip_cast                                               
                              fused_multiply_round_clip_cast                    
                                                         14.631      0.012    
(1, 64, 56, 56)    1       1
   
fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948_
      
fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948_
      14.132      0.012    (1, 256, 14, 14)   2       1
   fused_cast_24                                                                
                              fused_cast_24                                     
                                                         13.572      0.011    
(1, 128, 28, 28)   1       1
   fused_cast_25                                                                
                              fused_cast_25                                     
                                                         12.463      0.01     
(1, 64, 56, 56)    1       1
   fused_cast_23                                                                
                              fused_cast_23                                     
                                                         10.094      0.008    
(1, 256, 14, 14)   1       1
   fused_cast_22                                                                
                              fused_cast_22                                     
                                                         10.012      0.008    
(1, 512, 7, 7)     1       1
   fused_nn_batch_flatten_nn_batch_flatten_multiply                             
                              fused_nn_batch_flatten_nn_batch_flatten_multiply  
                                                         9.941       0.008    
(1, 512)           1       1
   Total_time                                                                   
                              -                                                 
                                                         119088.537  -        - 
                 -       -
   ```
   
   Besides, the accuracy after the commit is close to zero on 
ILSVRC2012_img_val dataset.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to