sewardto commented on issue #4972: Performance regression of quantization on 
CUDA after [Relay][AutoTVM] Relay op strategy (#4644) 
URL: https://github.com/apache/incubator-tvm/issues/4972#issuecomment-596034110
 
 
   After auto-tuning on 1070 Max-Q, the speed is much more faster:
   ```
   [10:09:15] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:92:
 Iteration: 0
   [10:09:15] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #0 fused_nn_conv2d_multiply_add_nn_relu: 180.928 us/iter
   [10:09:15] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #1 fused_nn_max_pool2d_1: 29.4735 us/iter
   [10:09:15] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #2 fused_multiply_round_clip_cast: 14.2788 us/iter
   [10:09:15] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #3 
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036_:
 78.2669 us/iter
   [10:09:15] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #4 
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__3:
 83.7044 us/iter
   [10:09:15] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #5 
fused_multiply_round_clip_cast_cast_left_shift_multiply_add_right_shift_cast_add_2320814265661055830_:
 16.1587 us/iter
   [10:09:15] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #6 fused_cast_25: 12.3677 us/iter
   [10:09:15] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #7 
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__1:
 80.4706 us/iter
   [10:09:15] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #8 
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__4:
 83.8792 us/iter
   [10:09:15] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #9 
fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948__2:
 15.3279 us/iter
   [10:09:15] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #10 
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__2:
 27.4534 us/iter
   [10:09:15] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #11 
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__2:
 90.9006 us/iter
   [10:09:15] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #12 
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__5:
 101.824 us/iter
   [10:09:15] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #13 
fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089__2:
 12.3551 us/iter
   [10:09:15] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #14 fused_cast_24: 10.8766 us/iter
   [10:09:15] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #15 
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__3:
 100.764 us/iter
   [10:09:15] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #16 
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__6:
 101.819 us/iter
   [10:09:15] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #17 
fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948__1:
 12.1372 us/iter
   [10:09:15] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #18 
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__1:
 37.6922 us/iter
   [10:09:15] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #19 
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__4:
 115.165 us/iter
   [10:09:15] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #20 
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__7:
 150.852 us/iter
   [10:09:15] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #21 
fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089__1:
 10.9095 us/iter
   [10:09:15] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #22 fused_cast_23: 9.8539 us/iter
   [10:09:15] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #23 
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__5:
 150.405 us/iter
   [10:09:15] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #24 
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__8:
 150.738 us/iter
   [10:09:15] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #25 
fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948_:
 11.0084 us/iter
   [10:09:15] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #26 
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588_:
 36.0865 us/iter
   [10:09:15] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #27 
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__6:
 161.054 us/iter
   [10:09:15] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #28 
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__9:
 252.2 us/iter
   [10:09:15] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #29 
fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089_:
 9.9324 us/iter
   [10:09:15] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #30 fused_cast_22: 9.2837 us/iter
   [10:09:15] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #31 
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__7:
 252.951 us/iter
   [10:09:15] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #32 
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__10:
 252.633 us/iter
   [10:09:15] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #33 
fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast: 
9.8974 us/iter
   [10:09:15] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #34 fused_nn_global_avg_pool2d_cast_multiply: 12.1224 us/iter
   [10:09:15] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #35 fused_nn_batch_flatten_nn_batch_flatten_multiply: 9.1246 us/iter
   [10:09:15] 
/home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97:
 Op #36 fused_nn_dense_nn_bias_add: 22.1244 us/iter
   Node Name                                                                    
                              Ops                                               
                                                         Time(us)  Time(%)  
Shape              Inputs  Outputs  
   ---------                                                                    
                              ---                                               
                                                         --------  -------  
-----              ------  -------  
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__7
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__7
   252.951   9.31     (1, 512, 7, 7)     4       1        
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__10
  
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__10
  252.633   9.298    (1, 512, 7, 7)     4       1        
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__9
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__9
   252.2     9.282    (1, 512, 7, 7)     4       1        
   fused_nn_conv2d_multiply_add_nn_relu                                         
                              fused_nn_conv2d_multiply_add_nn_relu              
                                                         180.928   6.659    (1, 
64, 112, 112)  4       1        
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__6
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__6
   161.054   5.928    (1, 512, 7, 7)     4       1        
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__7
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__7
   150.852   5.552    (1, 256, 14, 14)   4       1        
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__8
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__8
   150.738   5.548    (1, 256, 14, 14)   4       1        
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__5
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__5
   150.405   5.536    (1, 256, 14, 14)   4       1        
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__4
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__4
   115.165   4.239    (1, 256, 14, 14)   4       1        
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__5
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__5
   101.824   3.748    (1, 128, 28, 28)   4       1        
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__6
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__6
   101.819   3.747    (1, 128, 28, 28)   4       1        
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__3
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__3
   100.764   3.709    (1, 128, 28, 28)   4       1        
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__2
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__2
   90.901    3.346    (1, 128, 28, 28)   4       1        
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__4
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__4
   83.879    3.087    (1, 64, 56, 56)    4       1        
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__3
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__3
   83.704    3.081    (1, 64, 56, 56)    4       1        
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__1
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__1
   80.471    2.962    (1, 64, 56, 56)    4       1        
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036_
     
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036_
     78.267    2.881    (1, 64, 56, 56)    4       1        
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__1
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__1
   37.692    1.387    (1, 256, 14, 14)   4       1        
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588_
     
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588_
     36.087    1.328    (1, 512, 7, 7)     4       1        
   fused_nn_max_pool2d_1                                                        
                              fused_nn_max_pool2d_1                             
                                                         29.474    1.085    (1, 
64, 56, 56)    1       1        
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__2
   
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__2
   27.453    1.01     (1, 128, 28, 28)   4       1        
   fused_nn_dense_nn_bias_add                                                   
                              fused_nn_dense_nn_bias_add                        
                                                         22.124    0.814    (1, 
1000)          3       1        
   
fused_multiply_round_clip_cast_cast_left_shift_multiply_add_right_shift_cast_add_2320814265661055830_
      
fused_multiply_round_clip_cast_cast_left_shift_multiply_add_right_shift_cast_add_2320814265661055830_
      16.159    0.595    (1, 64, 56, 56)    2       1        
   
fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948__2
    
fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948__2
    15.328    0.564    (1, 64, 56, 56)    2       1        
   fused_multiply_round_clip_cast                                               
                              fused_multiply_round_clip_cast                    
                                                         14.279    0.526    (1, 
64, 56, 56)    1       1        
   fused_cast_25                                                                
                              fused_cast_25                                     
                                                         12.368    0.455    (1, 
64, 56, 56)    1       1        
   
fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089__2
   
fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089__2
   12.355    0.455    (1, 128, 28, 28)   2       1        
   
fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948__1
    
fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948__1
    12.137    0.447    (1, 128, 28, 28)   2       1        
   fused_nn_global_avg_pool2d_cast_multiply                                     
                              fused_nn_global_avg_pool2d_cast_multiply          
                                                         12.122    0.446    (1, 
512, 1, 1)     1       1        
   
fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948_
      
fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948_
      11.008    0.405    (1, 256, 14, 14)   2       1        
   
fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089__1
   
fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089__1
   10.909    0.402    (1, 256, 14, 14)   2       1        
   fused_cast_24                                                                
                              fused_cast_24                                     
                                                         10.877    0.4      (1, 
128, 28, 28)   1       1        
   
fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089_
     
fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089_
     9.932     0.366    (1, 512, 7, 7)     2       1        
   fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast    
                              
fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast       
                           9.897     0.364    (1, 512, 7, 7)     2       1      
  
   fused_cast_23                                                                
                              fused_cast_23                                     
                                                         9.854     0.363    (1, 
256, 14, 14)   1       1        
   fused_cast_22                                                                
                              fused_cast_22                                     
                                                         9.284     0.342    (1, 
512, 7, 7)     1       1        
   fused_nn_batch_flatten_nn_batch_flatten_multiply                             
                              fused_nn_batch_flatten_nn_batch_flatten_multiply  
                                                         9.125     0.336    (1, 
512)           1       1        
   Total_time                                                                   
                              -                                                 
                                                         2717.019  -        -   
               -       -        
   ```
   The log file follows.
   ```
   {"input": ["cuda -model=unknown", "conv2d_nchw_winograd.cuda", [["TENSOR", 
[1, 512, 7, 7], "int8"], ["TENSOR", [512, 512, 3, 3], "int8"], [1, 1], [1, 1, 
1, 1], [1, 1], "int32"], {}], "config": {"index": 267206, "code_hash": null, 
"entity": [["tile_b", "sp", [-1, 1, 1, 1]], ["tile_y", "sp", [-1, 1, 16, 4]], 
["tile_x", "sp", [-1, 1, 8, 2]], ["tile_rc", "sp", [-1, 16]], 
["auto_unroll_max_step", "ot", 0], ["unroll_explicit", "ot", 1]]}, "result": 
[[0.00010122895650355499], 0, 2.20457124710083, 1583490003.2625034], "version": 
0.2, "tvm_version": "0.7.dev1"}
   {"input": ["cuda -model=unknown", "conv2d_nchw.cuda", [["TENSOR", [1, 512, 
7, 7], "int8"], ["TENSOR", [512, 512, 3, 3], "int8"], [1, 1], [1, 1, 1, 1], [1, 
1], "int32"], {}], "config": {"index": 684894, "code_hash": null, "entity": 
[["tile_f", "sp", [-1, 1, 16, 1]], ["tile_y", "sp", [-1, 7, 1, 1]], ["tile_x", 
"sp", [-1, 1, 7, 1]], ["tile_rc", "sp", [-1, 16]], ["tile_ry", "sp", [-1, 3]], 
["tile_rx", "sp", [-1, 3]], ["auto_unroll_max_step", "ot", 512], 
["unroll_explicit", "ot", 1]]}, "result": [[0.00016264779663299663], 0, 
2.149959087371826, 1583493321.5959847], "version": 0.2, "tvm_version": 
"0.7.dev1"}
   {"input": ["cuda -model=unknown", "conv2d_nchw.cuda", [["TENSOR", [1, 256, 
14, 14], "int8"], ["TENSOR", [512, 256, 3, 3], "int8"], [2, 2], [1, 1, 1, 1], 
[1, 1], "int32"], {}], "config": {"index": 618454, "code_hash": null, "entity": 
[["tile_f", "sp", [-1, 1, 16, 1]], ["tile_y", "sp", [-1, 1, 1, 7]], ["tile_x", 
"sp", [-1, 1, 7, 1]], ["tile_rc", "sp", [-1, 16]], ["tile_ry", "sp", [-1, 3]], 
["tile_rx", "sp", [-1, 3]], ["auto_unroll_max_step", "ot", 512], 
["unroll_explicit", "ot", 1]]}, "result": [[0.00011399263832785345], 0, 
2.7174792289733887, 1583495309.8544536], "version": 0.2, "tvm_version": 
"0.7.dev1"}
   {"input": ["cuda -model=unknown", "conv2d_nchw.cuda", [["TENSOR", [1, 256, 
14, 14], "int8"], ["TENSOR", [512, 256, 1, 1], "int8"], [2, 2], [0, 0, 0, 0], 
[1, 1], "int32"], {}], "config": {"index": 142595, "code_hash": null, "entity": 
[["tile_f", "sp", [-1, 2, 16, 1]], ["tile_y", "sp", [-1, 1, 1, 1]], ["tile_x", 
"sp", [-1, 1, 7, 1]], ["tile_rc", "sp", [-1, 16]], ["tile_ry", "sp", [-1, 1]], 
["tile_rx", "sp", [-1, 1]], ["auto_unroll_max_step", "ot", 512], 
["unroll_explicit", "ot", 1]]}, "result": [[2.1068503167097867e-05], 0, 
1.9730663299560547, 1583497423.7057633], "version": 0.2, "tvm_version": 
"0.7.dev1"}
   {"input": ["cuda -model=unknown", "conv2d_nchw_winograd.cuda", [["TENSOR", 
[1, 256, 14, 14], "int8"], ["TENSOR", [256, 256, 3, 3], "int8"], [1, 1], [1, 1, 
1, 1], [1, 1], "int32"], {}], "config": {"index": 38688, "code_hash": null, 
"entity": [["tile_b", "sp", [-1, 1, 1, 1]], ["tile_y", "sp", [-1, 1, 64, 2]], 
["tile_x", "sp", [-1, 7, 7, 1]], ["tile_rc", "sp", [-1, 32]], 
["auto_unroll_max_step", "ot", 1500], ["unroll_explicit", "ot", 0]]}, "result": 
[[8.167179043743642e-05], 0, 2.713115692138672, 1583499212.5806458], "version": 
0.2, "tvm_version": "0.7.dev1"}
   {"input": ["cuda -model=unknown", "conv2d_nchw.cuda", [["TENSOR", [1, 256, 
14, 14], "int8"], ["TENSOR", [256, 256, 3, 3], "int8"], [1, 1], [1, 1, 1, 1], 
[1, 1], "int32"], {}], "config": {"index": 8973055, "code_hash": null, 
"entity": [["tile_f", "sp", [-1, 2, 8, 1]], ["tile_y", "sp", [-1, 1, 2, 7]], 
["tile_x", "sp", [-1, 1, 7, 1]], ["tile_rc", "sp", [-1, 32]], ["tile_ry", "sp", 
[-1, 3]], ["tile_rx", "sp", [-1, 3]], ["auto_unroll_max_step", "ot", 1500], 
["unroll_explicit", "ot", 1]]}, "result": [[0.00010543543769129865], 0, 
2.6481783390045166, 1583501043.3432517], "version": 0.2, "tvm_version": 
"0.7.dev1"}
   {"input": ["cuda -model=unknown", "conv2d_nchw.cuda", [["TENSOR", [1, 128, 
28, 28], "int8"], ["TENSOR", [256, 128, 3, 3], "int8"], [2, 2], [1, 1, 1, 1], 
[1, 1], "int32"], {}], "config": {"index": 8001541, "code_hash": null, 
"entity": [["tile_f", "sp", [-1, 2, 16, 1]], ["tile_y", "sp", [-1, 1, 2, 7]], 
["tile_x", "sp", [-1, 1, 7, 1]], ["tile_rc", "sp", [-1, 32]], ["tile_ry", "sp", 
[-1, 3]], ["tile_rx", "sp", [-1, 3]], ["auto_unroll_max_step", "ot", 1500], 
["unroll_explicit", "ot", 1]]}, "result": [[7.353533925290876e-05], 0, 
2.8866312503814697, 1583503869.7059126], "version": 0.2, "tvm_version": 
"0.7.dev1"}
   {"input": ["cuda -model=unknown", "conv2d_nchw.cuda", [["TENSOR", [1, 128, 
28, 28], "int8"], ["TENSOR", [256, 128, 1, 1], "int8"], [2, 2], [0, 0, 0, 0], 
[1, 1], "int32"], {}], "config": {"index": 1584197, "code_hash": null, 
"entity": [["tile_f", "sp", [-1, 4, 16, 1]], ["tile_y", "sp", [-1, 2, 1, 1]], 
["tile_x", "sp", [-1, 1, 14, 1]], ["tile_rc", "sp", [-1, 32]], ["tile_ry", 
"sp", [-1, 1]], ["tile_rx", "sp", [-1, 1]], ["auto_unroll_max_step", "ot", 
512], ["unroll_explicit", "ot", 1]]}, "result": [[1.4255146899458157e-05], 0, 
2.4680140018463135, 1583507510.7302575], "version": 0.2, "tvm_version": 
"0.7.dev1"}
   {"input": ["cuda -model=unknown", "conv2d_nchw_winograd.cuda", [["TENSOR", 
[1, 128, 28, 28], "int8"], ["TENSOR", [128, 128, 3, 3], "int8"], [1, 1], [1, 1, 
1, 1], [1, 1], "int32"], {}], "config": {"index": 543511, "code_hash": null, 
"entity": [["tile_b", "sp", [-1, 1, 1, 1]], ["tile_y", "sp", [-1, 2, 32, 1]], 
["tile_x", "sp", [-1, 7, 28, 1]], ["tile_rc", "sp", [-1, 32]], 
["auto_unroll_max_step", "ot", 1500], ["unroll_explicit", "ot", 1]]}, "result": 
[[8.007428490878938e-05], 0, 2.5169312953948975, 1583511425.53789], "version": 
0.2, "tvm_version": "0.7.dev1"}
   {"input": ["cuda -model=unknown", "conv2d_nchw.cuda", [["TENSOR", [1, 128, 
28, 28], "int8"], ["TENSOR", [128, 128, 3, 3], "int8"], [1, 1], [1, 1, 1, 1], 
[1, 1], "int32"], {}], "config": {"index": 36148587, "code_hash": null, 
"entity": [["tile_f", "sp", [-1, 2, 16, 1]], ["tile_y", "sp", [-1, 1, 2, 14]], 
["tile_x", "sp", [-1, 1, 4, 1]], ["tile_rc", "sp", [-1, 16]], ["tile_ry", "sp", 
[-1, 3]], ["tile_rx", "sp", [-1, 3]], ["auto_unroll_max_step", "ot", 1500], 
["unroll_explicit", "ot", 1]]}, "result": [[6.772174803149606e-05], 0, 
2.3003950119018555, 1583516359.102587], "version": 0.2, "tvm_version": 
"0.7.dev1"}
   {"input": ["cuda -model=unknown", "conv2d_nchw.cuda", [["TENSOR", [1, 64, 
56, 56], "int8"], ["TENSOR", [128, 64, 3, 3], "int8"], [2, 2], [1, 1, 1, 1], 
[1, 1], "int32"], {}], "config": {"index": 31741619, "code_hash": null, 
"entity": [["tile_f", "sp", [-1, 2, 16, 2]], ["tile_y", "sp", [-1, 1, 2, 7]], 
["tile_x", "sp", [-1, 1, 7, 1]], ["tile_rc", "sp", [-1, 16]], ["tile_ry", "sp", 
[-1, 3]], ["tile_rx", "sp", [-1, 3]], ["auto_unroll_max_step", "ot", 1500], 
["unroll_explicit", "ot", 1]]}, "result": [[5.370661902625084e-05], 0, 
3.261442184448242, 1583518624.2030337], "version": 0.2, "tvm_version": 
"0.7.dev1"}
   {"input": ["cuda -model=unknown", "conv2d_nchw.cuda", [["TENSOR", [1, 64, 
56, 56], "int8"], ["TENSOR", [128, 64, 1, 1], "int8"], [2, 2], [0, 0, 0, 0], 
[1, 1], "int32"], {}], "config": {"index": 2195842, "code_hash": null, 
"entity": [["tile_f", "sp", [-1, 1, 16, 4]], ["tile_y", "sp", [-1, 1, 1, 2]], 
["tile_x", "sp", [-1, 1, 28, 1]], ["tile_rc", "sp", [-1, 16]], ["tile_ry", 
"sp", [-1, 1]], ["tile_rx", "sp", [-1, 1]], ["auto_unroll_max_step", "ot", 
512], ["unroll_explicit", "ot", 0]]}, "result": [[1.1885200473884825e-05], 0, 
2.094285726547241, 1583521724.6021314], "version": 0.2, "tvm_version": 
"0.7.dev1"}
   {"input": ["cuda -model=unknown", "conv2d_nchw_winograd.cuda", [["TENSOR", 
[1, 64, 56, 56], "int8"], ["TENSOR", [64, 64, 3, 3], "int8"], [1, 1], [1, 1, 1, 
1], [1, 1], "int32"], {}], "config": {"index": 94814, "code_hash": null, 
"entity": [["tile_b", "sp", [-1, 1, 1, 1]], ["tile_y", "sp", [-1, 2, 8, 4]], 
["tile_x", "sp", [-1, 1, 28, 1]], ["tile_rc", "sp", [-1, 16]], 
["auto_unroll_max_step", "ot", 128], ["unroll_explicit", "ot", 0]]}, "result": 
[[6.607622816593886e-05], 0, 3.667783737182617, 1583526061.1939218], "version": 
0.2, "tvm_version": "0.7.dev1"}
   {"input": ["cuda -model=unknown", "conv2d_nchw.cuda", [["TENSOR", [1, 64, 
56, 56], "int8"], ["TENSOR", [64, 64, 3, 3], "int8"], [1, 1], [1, 1, 1, 1], [1, 
1], "int32"], {}], "config": {"index": 88977971, "code_hash": null, "entity": 
[["tile_f", "sp", [-1, 2, 16, 2]], ["tile_y", "sp", [-1, 1, 2, 7]], ["tile_x", 
"sp", [-1, 1, 4, 2]], ["tile_rc", "sp", [-1, 16]], ["tile_ry", "sp", [-1, 3]], 
["tile_rx", "sp", [-1, 3]], ["auto_unroll_max_step", "ot", 1500], 
["unroll_explicit", "ot", 1]]}, "result": [[4.877310412853366e-05], 0, 
2.9656577110290527, 1583528602.3562064], "version": 0.2, "tvm_version": 
"0.7.dev1"}
   {"input": ["cuda -model=unknown", "conv2d_nchw.cuda", [["TENSOR", [1, 3, 
224, 224], "float32"], ["TENSOR", [64, 3, 7, 7], "float32"], [2, 2], [3, 3, 3, 
3], [1, 1], "float32"], {}], "config": {"index": 36609153, "code_hash": null, 
"entity": [["tile_f", "sp", [-1, 8, 8, 1]], ["tile_y", "sp", [-1, 7, 1, 1]], 
["tile_x", "sp", [-1, 1, 14, 1]], ["tile_rc", "sp", [-1, 1]], ["tile_ry", "sp", 
[-1, 7]], ["tile_rx", "sp", [-1, 7]], ["auto_unroll_max_step", "ot", 1500], 
["unroll_explicit", "ot", 0]]}, "result": [[7.900877518104015e-05], 0, 
2.1976771354675293, 1583533339.5779808], "version": 0.2, "tvm_version": 
"0.7.dev1"}
   
   ```
   
   However, the accuracy is still close to zero. Like this:
   ```
   Top1 Acc: 0.0026109660574412533, 1/383
   Top5 Acc: 0.010443864229765013, 4/383
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to