sewardto commented on issue #4972: Performance regression of quantization on CUDA after [Relay][AutoTVM] Relay op strategy (#4644) URL: https://github.com/apache/incubator-tvm/issues/4972#issuecomment-596034110 After auto-tuning on 1070 Max-Q, the speed is much more faster: ``` [10:09:15] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:92: Iteration: 0 [10:09:15] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #0 fused_nn_conv2d_multiply_add_nn_relu: 180.928 us/iter [10:09:15] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #1 fused_nn_max_pool2d_1: 29.4735 us/iter [10:09:15] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #2 fused_multiply_round_clip_cast: 14.2788 us/iter [10:09:15] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #3 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036_: 78.2669 us/iter [10:09:15] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #4 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__3: 83.7044 us/iter [10:09:15] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #5 fused_multiply_round_clip_cast_cast_left_shift_multiply_add_right_shift_cast_add_2320814265661055830_: 16.1587 us/iter [10:09:15] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #6 fused_cast_25: 12.3677 us/iter [10:09:15] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #7 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__1: 80.4706 us/iter [10:09:15] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #8 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__4: 83.8792 us/iter [10:09:15] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #9 fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948__2: 15.3279 us/iter [10:09:15] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #10 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__2: 27.4534 us/iter [10:09:15] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #11 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__2: 90.9006 us/iter [10:09:15] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #12 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__5: 101.824 us/iter [10:09:15] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #13 fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089__2: 12.3551 us/iter [10:09:15] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #14 fused_cast_24: 10.8766 us/iter [10:09:15] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #15 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__3: 100.764 us/iter [10:09:15] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #16 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__6: 101.819 us/iter [10:09:15] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #17 fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948__1: 12.1372 us/iter [10:09:15] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #18 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__1: 37.6922 us/iter [10:09:15] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #19 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__4: 115.165 us/iter [10:09:15] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #20 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__7: 150.852 us/iter [10:09:15] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #21 fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089__1: 10.9095 us/iter [10:09:15] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #22 fused_cast_23: 9.8539 us/iter [10:09:15] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #23 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__5: 150.405 us/iter [10:09:15] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #24 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__8: 150.738 us/iter [10:09:15] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #25 fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948_: 11.0084 us/iter [10:09:15] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #26 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588_: 36.0865 us/iter [10:09:15] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #27 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__6: 161.054 us/iter [10:09:15] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #28 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__9: 252.2 us/iter [10:09:15] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #29 fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089_: 9.9324 us/iter [10:09:15] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #30 fused_cast_22: 9.2837 us/iter [10:09:15] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #31 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__7: 252.951 us/iter [10:09:15] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #32 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__10: 252.633 us/iter [10:09:15] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #33 fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast: 9.8974 us/iter [10:09:15] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #34 fused_nn_global_avg_pool2d_cast_multiply: 12.1224 us/iter [10:09:15] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #35 fused_nn_batch_flatten_nn_batch_flatten_multiply: 9.1246 us/iter [10:09:15] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #36 fused_nn_dense_nn_bias_add: 22.1244 us/iter Node Name Ops Time(us) Time(%) Shape Inputs Outputs --------- --- -------- ------- ----- ------ ------- fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__7 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__7 252.951 9.31 (1, 512, 7, 7) 4 1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__10 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__10 252.633 9.298 (1, 512, 7, 7) 4 1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__9 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__9 252.2 9.282 (1, 512, 7, 7) 4 1 fused_nn_conv2d_multiply_add_nn_relu fused_nn_conv2d_multiply_add_nn_relu 180.928 6.659 (1, 64, 112, 112) 4 1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__6 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__6 161.054 5.928 (1, 512, 7, 7) 4 1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__7 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__7 150.852 5.552 (1, 256, 14, 14) 4 1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__8 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__8 150.738 5.548 (1, 256, 14, 14) 4 1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__5 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__5 150.405 5.536 (1, 256, 14, 14) 4 1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__4 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__4 115.165 4.239 (1, 256, 14, 14) 4 1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__5 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__5 101.824 3.748 (1, 128, 28, 28) 4 1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__6 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__6 101.819 3.747 (1, 128, 28, 28) 4 1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__3 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__3 100.764 3.709 (1, 128, 28, 28) 4 1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__2 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__2 90.901 3.346 (1, 128, 28, 28) 4 1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__4 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__4 83.879 3.087 (1, 64, 56, 56) 4 1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__3 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__3 83.704 3.081 (1, 64, 56, 56) 4 1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__1 80.471 2.962 (1, 64, 56, 56) 4 1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036_ fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036_ 78.267 2.881 (1, 64, 56, 56) 4 1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__1 37.692 1.387 (1, 256, 14, 14) 4 1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588_ fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588_ 36.087 1.328 (1, 512, 7, 7) 4 1 fused_nn_max_pool2d_1 fused_nn_max_pool2d_1 29.474 1.085 (1, 64, 56, 56) 1 1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__2 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__2 27.453 1.01 (1, 128, 28, 28) 4 1 fused_nn_dense_nn_bias_add fused_nn_dense_nn_bias_add 22.124 0.814 (1, 1000) 3 1 fused_multiply_round_clip_cast_cast_left_shift_multiply_add_right_shift_cast_add_2320814265661055830_ fused_multiply_round_clip_cast_cast_left_shift_multiply_add_right_shift_cast_add_2320814265661055830_ 16.159 0.595 (1, 64, 56, 56) 2 1 fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948__2 fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948__2 15.328 0.564 (1, 64, 56, 56) 2 1 fused_multiply_round_clip_cast fused_multiply_round_clip_cast 14.279 0.526 (1, 64, 56, 56) 1 1 fused_cast_25 fused_cast_25 12.368 0.455 (1, 64, 56, 56) 1 1 fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089__2 fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089__2 12.355 0.455 (1, 128, 28, 28) 2 1 fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948__1 fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948__1 12.137 0.447 (1, 128, 28, 28) 2 1 fused_nn_global_avg_pool2d_cast_multiply fused_nn_global_avg_pool2d_cast_multiply 12.122 0.446 (1, 512, 1, 1) 1 1 fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948_ fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948_ 11.008 0.405 (1, 256, 14, 14) 2 1 fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089__1 fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089__1 10.909 0.402 (1, 256, 14, 14) 2 1 fused_cast_24 fused_cast_24 10.877 0.4 (1, 128, 28, 28) 1 1 fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089_ fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089_ 9.932 0.366 (1, 512, 7, 7) 2 1 fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast 9.897 0.364 (1, 512, 7, 7) 2 1 fused_cast_23 fused_cast_23 9.854 0.363 (1, 256, 14, 14) 1 1 fused_cast_22 fused_cast_22 9.284 0.342 (1, 512, 7, 7) 1 1 fused_nn_batch_flatten_nn_batch_flatten_multiply fused_nn_batch_flatten_nn_batch_flatten_multiply 9.125 0.336 (1, 512) 1 1 Total_time - 2717.019 - - - - ``` The log file follows. ``` {"input": ["cuda -model=unknown", "conv2d_nchw_winograd.cuda", [["TENSOR", [1, 512, 7, 7], "int8"], ["TENSOR", [512, 512, 3, 3], "int8"], [1, 1], [1, 1, 1, 1], [1, 1], "int32"], {}], "config": {"index": 267206, "code_hash": null, "entity": [["tile_b", "sp", [-1, 1, 1, 1]], ["tile_y", "sp", [-1, 1, 16, 4]], ["tile_x", "sp", [-1, 1, 8, 2]], ["tile_rc", "sp", [-1, 16]], ["auto_unroll_max_step", "ot", 0], ["unroll_explicit", "ot", 1]]}, "result": [[0.00010122895650355499], 0, 2.20457124710083, 1583490003.2625034], "version": 0.2, "tvm_version": "0.7.dev1"} {"input": ["cuda -model=unknown", "conv2d_nchw.cuda", [["TENSOR", [1, 512, 7, 7], "int8"], ["TENSOR", [512, 512, 3, 3], "int8"], [1, 1], [1, 1, 1, 1], [1, 1], "int32"], {}], "config": {"index": 684894, "code_hash": null, "entity": [["tile_f", "sp", [-1, 1, 16, 1]], ["tile_y", "sp", [-1, 7, 1, 1]], ["tile_x", "sp", [-1, 1, 7, 1]], ["tile_rc", "sp", [-1, 16]], ["tile_ry", "sp", [-1, 3]], ["tile_rx", "sp", [-1, 3]], ["auto_unroll_max_step", "ot", 512], ["unroll_explicit", "ot", 1]]}, "result": [[0.00016264779663299663], 0, 2.149959087371826, 1583493321.5959847], "version": 0.2, "tvm_version": "0.7.dev1"} {"input": ["cuda -model=unknown", "conv2d_nchw.cuda", [["TENSOR", [1, 256, 14, 14], "int8"], ["TENSOR", [512, 256, 3, 3], "int8"], [2, 2], [1, 1, 1, 1], [1, 1], "int32"], {}], "config": {"index": 618454, "code_hash": null, "entity": [["tile_f", "sp", [-1, 1, 16, 1]], ["tile_y", "sp", [-1, 1, 1, 7]], ["tile_x", "sp", [-1, 1, 7, 1]], ["tile_rc", "sp", [-1, 16]], ["tile_ry", "sp", [-1, 3]], ["tile_rx", "sp", [-1, 3]], ["auto_unroll_max_step", "ot", 512], ["unroll_explicit", "ot", 1]]}, "result": [[0.00011399263832785345], 0, 2.7174792289733887, 1583495309.8544536], "version": 0.2, "tvm_version": "0.7.dev1"} {"input": ["cuda -model=unknown", "conv2d_nchw.cuda", [["TENSOR", [1, 256, 14, 14], "int8"], ["TENSOR", [512, 256, 1, 1], "int8"], [2, 2], [0, 0, 0, 0], [1, 1], "int32"], {}], "config": {"index": 142595, "code_hash": null, "entity": [["tile_f", "sp", [-1, 2, 16, 1]], ["tile_y", "sp", [-1, 1, 1, 1]], ["tile_x", "sp", [-1, 1, 7, 1]], ["tile_rc", "sp", [-1, 16]], ["tile_ry", "sp", [-1, 1]], ["tile_rx", "sp", [-1, 1]], ["auto_unroll_max_step", "ot", 512], ["unroll_explicit", "ot", 1]]}, "result": [[2.1068503167097867e-05], 0, 1.9730663299560547, 1583497423.7057633], "version": 0.2, "tvm_version": "0.7.dev1"} {"input": ["cuda -model=unknown", "conv2d_nchw_winograd.cuda", [["TENSOR", [1, 256, 14, 14], "int8"], ["TENSOR", [256, 256, 3, 3], "int8"], [1, 1], [1, 1, 1, 1], [1, 1], "int32"], {}], "config": {"index": 38688, "code_hash": null, "entity": [["tile_b", "sp", [-1, 1, 1, 1]], ["tile_y", "sp", [-1, 1, 64, 2]], ["tile_x", "sp", [-1, 7, 7, 1]], ["tile_rc", "sp", [-1, 32]], ["auto_unroll_max_step", "ot", 1500], ["unroll_explicit", "ot", 0]]}, "result": [[8.167179043743642e-05], 0, 2.713115692138672, 1583499212.5806458], "version": 0.2, "tvm_version": "0.7.dev1"} {"input": ["cuda -model=unknown", "conv2d_nchw.cuda", [["TENSOR", [1, 256, 14, 14], "int8"], ["TENSOR", [256, 256, 3, 3], "int8"], [1, 1], [1, 1, 1, 1], [1, 1], "int32"], {}], "config": {"index": 8973055, "code_hash": null, "entity": [["tile_f", "sp", [-1, 2, 8, 1]], ["tile_y", "sp", [-1, 1, 2, 7]], ["tile_x", "sp", [-1, 1, 7, 1]], ["tile_rc", "sp", [-1, 32]], ["tile_ry", "sp", [-1, 3]], ["tile_rx", "sp", [-1, 3]], ["auto_unroll_max_step", "ot", 1500], ["unroll_explicit", "ot", 1]]}, "result": [[0.00010543543769129865], 0, 2.6481783390045166, 1583501043.3432517], "version": 0.2, "tvm_version": "0.7.dev1"} {"input": ["cuda -model=unknown", "conv2d_nchw.cuda", [["TENSOR", [1, 128, 28, 28], "int8"], ["TENSOR", [256, 128, 3, 3], "int8"], [2, 2], [1, 1, 1, 1], [1, 1], "int32"], {}], "config": {"index": 8001541, "code_hash": null, "entity": [["tile_f", "sp", [-1, 2, 16, 1]], ["tile_y", "sp", [-1, 1, 2, 7]], ["tile_x", "sp", [-1, 1, 7, 1]], ["tile_rc", "sp", [-1, 32]], ["tile_ry", "sp", [-1, 3]], ["tile_rx", "sp", [-1, 3]], ["auto_unroll_max_step", "ot", 1500], ["unroll_explicit", "ot", 1]]}, "result": [[7.353533925290876e-05], 0, 2.8866312503814697, 1583503869.7059126], "version": 0.2, "tvm_version": "0.7.dev1"} {"input": ["cuda -model=unknown", "conv2d_nchw.cuda", [["TENSOR", [1, 128, 28, 28], "int8"], ["TENSOR", [256, 128, 1, 1], "int8"], [2, 2], [0, 0, 0, 0], [1, 1], "int32"], {}], "config": {"index": 1584197, "code_hash": null, "entity": [["tile_f", "sp", [-1, 4, 16, 1]], ["tile_y", "sp", [-1, 2, 1, 1]], ["tile_x", "sp", [-1, 1, 14, 1]], ["tile_rc", "sp", [-1, 32]], ["tile_ry", "sp", [-1, 1]], ["tile_rx", "sp", [-1, 1]], ["auto_unroll_max_step", "ot", 512], ["unroll_explicit", "ot", 1]]}, "result": [[1.4255146899458157e-05], 0, 2.4680140018463135, 1583507510.7302575], "version": 0.2, "tvm_version": "0.7.dev1"} {"input": ["cuda -model=unknown", "conv2d_nchw_winograd.cuda", [["TENSOR", [1, 128, 28, 28], "int8"], ["TENSOR", [128, 128, 3, 3], "int8"], [1, 1], [1, 1, 1, 1], [1, 1], "int32"], {}], "config": {"index": 543511, "code_hash": null, "entity": [["tile_b", "sp", [-1, 1, 1, 1]], ["tile_y", "sp", [-1, 2, 32, 1]], ["tile_x", "sp", [-1, 7, 28, 1]], ["tile_rc", "sp", [-1, 32]], ["auto_unroll_max_step", "ot", 1500], ["unroll_explicit", "ot", 1]]}, "result": [[8.007428490878938e-05], 0, 2.5169312953948975, 1583511425.53789], "version": 0.2, "tvm_version": "0.7.dev1"} {"input": ["cuda -model=unknown", "conv2d_nchw.cuda", [["TENSOR", [1, 128, 28, 28], "int8"], ["TENSOR", [128, 128, 3, 3], "int8"], [1, 1], [1, 1, 1, 1], [1, 1], "int32"], {}], "config": {"index": 36148587, "code_hash": null, "entity": [["tile_f", "sp", [-1, 2, 16, 1]], ["tile_y", "sp", [-1, 1, 2, 14]], ["tile_x", "sp", [-1, 1, 4, 1]], ["tile_rc", "sp", [-1, 16]], ["tile_ry", "sp", [-1, 3]], ["tile_rx", "sp", [-1, 3]], ["auto_unroll_max_step", "ot", 1500], ["unroll_explicit", "ot", 1]]}, "result": [[6.772174803149606e-05], 0, 2.3003950119018555, 1583516359.102587], "version": 0.2, "tvm_version": "0.7.dev1"} {"input": ["cuda -model=unknown", "conv2d_nchw.cuda", [["TENSOR", [1, 64, 56, 56], "int8"], ["TENSOR", [128, 64, 3, 3], "int8"], [2, 2], [1, 1, 1, 1], [1, 1], "int32"], {}], "config": {"index": 31741619, "code_hash": null, "entity": [["tile_f", "sp", [-1, 2, 16, 2]], ["tile_y", "sp", [-1, 1, 2, 7]], ["tile_x", "sp", [-1, 1, 7, 1]], ["tile_rc", "sp", [-1, 16]], ["tile_ry", "sp", [-1, 3]], ["tile_rx", "sp", [-1, 3]], ["auto_unroll_max_step", "ot", 1500], ["unroll_explicit", "ot", 1]]}, "result": [[5.370661902625084e-05], 0, 3.261442184448242, 1583518624.2030337], "version": 0.2, "tvm_version": "0.7.dev1"} {"input": ["cuda -model=unknown", "conv2d_nchw.cuda", [["TENSOR", [1, 64, 56, 56], "int8"], ["TENSOR", [128, 64, 1, 1], "int8"], [2, 2], [0, 0, 0, 0], [1, 1], "int32"], {}], "config": {"index": 2195842, "code_hash": null, "entity": [["tile_f", "sp", [-1, 1, 16, 4]], ["tile_y", "sp", [-1, 1, 1, 2]], ["tile_x", "sp", [-1, 1, 28, 1]], ["tile_rc", "sp", [-1, 16]], ["tile_ry", "sp", [-1, 1]], ["tile_rx", "sp", [-1, 1]], ["auto_unroll_max_step", "ot", 512], ["unroll_explicit", "ot", 0]]}, "result": [[1.1885200473884825e-05], 0, 2.094285726547241, 1583521724.6021314], "version": 0.2, "tvm_version": "0.7.dev1"} {"input": ["cuda -model=unknown", "conv2d_nchw_winograd.cuda", [["TENSOR", [1, 64, 56, 56], "int8"], ["TENSOR", [64, 64, 3, 3], "int8"], [1, 1], [1, 1, 1, 1], [1, 1], "int32"], {}], "config": {"index": 94814, "code_hash": null, "entity": [["tile_b", "sp", [-1, 1, 1, 1]], ["tile_y", "sp", [-1, 2, 8, 4]], ["tile_x", "sp", [-1, 1, 28, 1]], ["tile_rc", "sp", [-1, 16]], ["auto_unroll_max_step", "ot", 128], ["unroll_explicit", "ot", 0]]}, "result": [[6.607622816593886e-05], 0, 3.667783737182617, 1583526061.1939218], "version": 0.2, "tvm_version": "0.7.dev1"} {"input": ["cuda -model=unknown", "conv2d_nchw.cuda", [["TENSOR", [1, 64, 56, 56], "int8"], ["TENSOR", [64, 64, 3, 3], "int8"], [1, 1], [1, 1, 1, 1], [1, 1], "int32"], {}], "config": {"index": 88977971, "code_hash": null, "entity": [["tile_f", "sp", [-1, 2, 16, 2]], ["tile_y", "sp", [-1, 1, 2, 7]], ["tile_x", "sp", [-1, 1, 4, 2]], ["tile_rc", "sp", [-1, 16]], ["tile_ry", "sp", [-1, 3]], ["tile_rx", "sp", [-1, 3]], ["auto_unroll_max_step", "ot", 1500], ["unroll_explicit", "ot", 1]]}, "result": [[4.877310412853366e-05], 0, 2.9656577110290527, 1583528602.3562064], "version": 0.2, "tvm_version": "0.7.dev1"} {"input": ["cuda -model=unknown", "conv2d_nchw.cuda", [["TENSOR", [1, 3, 224, 224], "float32"], ["TENSOR", [64, 3, 7, 7], "float32"], [2, 2], [3, 3, 3, 3], [1, 1], "float32"], {}], "config": {"index": 36609153, "code_hash": null, "entity": [["tile_f", "sp", [-1, 8, 8, 1]], ["tile_y", "sp", [-1, 7, 1, 1]], ["tile_x", "sp", [-1, 1, 14, 1]], ["tile_rc", "sp", [-1, 1]], ["tile_ry", "sp", [-1, 7]], ["tile_rx", "sp", [-1, 7]], ["auto_unroll_max_step", "ot", 1500], ["unroll_explicit", "ot", 0]]}, "result": [[7.900877518104015e-05], 0, 2.1976771354675293, 1583533339.5779808], "version": 0.2, "tvm_version": "0.7.dev1"} ``` However, the accuracy is still close to zero. Like this: ``` Top1 Acc: 0.0026109660574412533, 1/383 Top5 Acc: 0.010443864229765013, 4/383 ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
