sewardto opened a new issue #4972: Performance regression of quantization on CUDA after [Relay][AutoTVM] Relay op strategy (#4644) URL: https://github.com/apache/incubator-tvm/issues/4972 My environment: ``` Linux ziran-pc 5.5.6-1-MANJARO #1 SMP Mon Feb 24 09:24:51 UTC 2020 x86_64 GNU/Linux CUDA Version: 10.2 Python 3.8.1 gcc (Arch Linux 9.2.1+20200130-2) 9.2.1 20200130 ``` Here is my code, which uses **resnet18v1 onnx** model. ``` resnetv1 = onnx.load('models/resnet18v1.onnx') input_blob = resnetv1.graph.input[0] input_shape = tuple(map(lambda x: getattr(x, 'dim_value'), input_blob.type.tensor_type.shape.dim)) shape_dict = {input_blob.name: input_shape} mod_resnetv1, params_resnetv1 = relay.frontend.from_onnx(resnetv1, shape_dict) mod_q_resnetv1 = quantize(mod_resnetv1, params_resnetv1) graph, mod, params = relay.build_module.build(mod_q_resnetv1, target='cuda', params=params_resnetv1) val_data = get_val_data() for i, batch in enumerate(val_data): if i > 0: break data, categories = batch['data'], batch['label'] m = debug_runtime.create(graph, mod, ctx, dump_root='tvmdbg') m.set_input('data', tvm.nd.array(data.astype('float32'))) m.run() tvm_out = m.get_output(0, tvm.nd.empty(tuple([1, 1000]), 'float32')).asnumpy() ``` Output when TVM is at ([Fix] Fix get_valid_count flaky test for cuda (#4901)): ``` [22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:92: Iteration: 0 [22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #0 fused_nn_conv2d_multiply_add_nn_relu: 1685.52 us/iter [22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #1 fused_nn_max_pool2d_1: 32.843 us/iter [22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #2 fused_multiply_round_clip_cast: 13.9443 us/iter [22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #3 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036_: 320.88 us/iter [22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #4 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__3: 321.255 us/iter [22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #5 fused_multiply_round_clip_cast_cast_left_shift_multiply_add_right_shift_cast_add_2320814265661055830_: 16.196 us/iter [22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #6 fused_cast_25: 12.0867 us/iter [22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #7 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__1: 319.658 us/iter [22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #8 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__4: 322.954 us/iter [22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #9 fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948__2: 15.1093 us/iter [22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #10 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__2: 63.3707 us/iter [22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #11 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__2: 482.38 us/iter [22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #12 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__5: 508.352 us/iter [22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #13 fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089__2: 12.5682 us/iter [22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #14 fused_cast_24: 10.7158 us/iter [22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #15 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__3: 506.871 us/iter [22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #16 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__6: 510.042 us/iter [22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #17 fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948__1: 12.363 us/iter [22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #18 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__1: 77.2029 us/iter [22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #19 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__4: 691.62 us/iter [22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #20 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__7: 532.286 us/iter [22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #21 fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089__1: 10.7689 us/iter [22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #22 fused_cast_23: 9.9673 us/iter [22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #23 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__5: 538.167 us/iter [22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #24 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__8: 540.056 us/iter [22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #25 fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948_: 11.4951 us/iter [22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #26 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588_: 104.663 us/iter [22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #27 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__6: 962.534 us/iter [22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #28 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__9: 1023.26 us/iter [22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #29 fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089_: 9.9758 us/iter [22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #30 fused_cast_22: 9.3292 us/iter [22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #31 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__7: 1025.56 us/iter [22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #32 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__10: 1024.85 us/iter [22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #33 fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast: 10.0607 us/iter [22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #34 fused_nn_global_avg_pool2d_cast_multiply: 12.0975 us/iter [22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #35 fused_nn_batch_flatten_nn_batch_flatten_multiply: 9.2545 us/iter [22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #36 fused_nn_dense_nn_bias_add: 21.2773 us/iter Node Name Ops Time(us) Time(%) Shape Inputs Outputs --------- --- -------- ------- ----- ------ ------- fused_nn_conv2d_multiply_add_nn_relu fused_nn_conv2d_multiply_add_nn_relu 1685.52 14.294 (1, 64, 112, 112) 4 1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__7 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__7 1025.56 8.697 (1, 512, 7, 7) 4 1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__10 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__10 1024.85 8.691 (1, 512, 7, 7) 4 1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__9 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__9 1023.26 8.678 (1, 512, 7, 7) 4 1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__6 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__6 962.534 8.163 (1, 512, 7, 7) 4 1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__4 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__4 691.62 5.865 (1, 256, 14, 14) 4 1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__8 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__8 540.056 4.58 (1, 256, 14, 14) 4 1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__5 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__5 538.167 4.564 (1, 256, 14, 14) 4 1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__7 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__7 532.286 4.514 (1, 256, 14, 14) 4 1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__6 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__6 510.042 4.325 (1, 128, 28, 28) 4 1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__5 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__5 508.352 4.311 (1, 128, 28, 28) 4 1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__3 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__3 506.871 4.299 (1, 128, 28, 28) 4 1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__2 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__2 482.38 4.091 (1, 128, 28, 28) 4 1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__4 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__4 322.954 2.739 (1, 64, 56, 56) 4 1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__3 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__3 321.255 2.724 (1, 64, 56, 56) 4 1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036_ fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036_ 320.88 2.721 (1, 64, 56, 56) 4 1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__1 319.658 2.711 (1, 64, 56, 56) 4 1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588_ fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588_ 104.663 0.888 (1, 512, 7, 7) 4 1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__1 77.203 0.655 (1, 256, 14, 14) 4 1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__2 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__2 63.371 0.537 (1, 128, 28, 28) 4 1 fused_nn_max_pool2d_1 fused_nn_max_pool2d_1 32.843 0.279 (1, 64, 56, 56) 1 1 fused_nn_dense_nn_bias_add fused_nn_dense_nn_bias_add 21.277 0.18 (1, 1000) 3 1 fused_multiply_round_clip_cast_cast_left_shift_multiply_add_right_shift_cast_add_2320814265661055830_ fused_multiply_round_clip_cast_cast_left_shift_multiply_add_right_shift_cast_add_2320814265661055830_ 16.196 0.137 (1, 64, 56, 56) 2 1 fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948__2 fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948__2 15.109 0.128 (1, 64, 56, 56) 2 1 fused_multiply_round_clip_cast fused_multiply_round_clip_cast 13.944 0.118 (1, 64, 56, 56) 1 1 fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089__2 fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089__2 12.568 0.107 (1, 128, 28, 28) 2 1 fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948__1 fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948__1 12.363 0.105 (1, 128, 28, 28) 2 1 fused_nn_global_avg_pool2d_cast_multiply fused_nn_global_avg_pool2d_cast_multiply 12.097 0.103 (1, 512, 1, 1) 1 1 fused_cast_25 fused_cast_25 12.087 0.103 (1, 64, 56, 56) 1 1 fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948_ fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948_ 11.495 0.097 (1, 256, 14, 14) 2 1 fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089__1 fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089__1 10.769 0.091 (1, 256, 14, 14) 2 1 fused_cast_24 fused_cast_24 10.716 0.091 (1, 128, 28, 28) 1 1 fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast 10.061 0.085 (1, 512, 7, 7) 2 1 fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089_ fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089_ 9.976 0.085 (1, 512, 7, 7) 2 1 fused_cast_23 fused_cast_23 9.967 0.085 (1, 256, 14, 14) 1 1 fused_cast_22 fused_cast_22 9.329 0.079 (1, 512, 7, 7) 1 1 fused_nn_batch_flatten_nn_batch_flatten_multiply fused_nn_batch_flatten_nn_batch_flatten_multiply 9.254 0.078 (1, 512) 1 1 Total_time - 11791.534 - - - - ``` Output when TVM is at ([Relay][AutoTVM] Relay op strategy (#4644)): ``` [22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:92: Iteration: 0 [22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #0 fused_nn_conv2d_multiply_add_nn_relu: 4584.26 us/iter [22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #1 fused_nn_max_pool2d_1: 30.2865 us/iter [22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #2 fused_multiply_round_clip_cast: 14.6314 us/iter [22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #3 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036_: 5281.79 us/iter [22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #4 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__3: 5251.26 us/iter [22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #5 fused_multiply_round_clip_cast_cast_left_shift_multiply_add_right_shift_cast_add_2320814265661055830_: 19.2247 us/iter [22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #6 fused_cast_25: 12.4631 us/iter [22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #7 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__1: 5161.39 us/iter [22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #8 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__4: 5320.71 us/iter [22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #9 fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948__2: 107.187 us/iter [22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #10 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__2: 59.8113 us/iter [22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #11 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__2: 426.696 us/iter [22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #12 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__5: 9036.95 us/iter [22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #13 fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089__2: 18.7588 us/iter [22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #14 fused_cast_24: 13.5717 us/iter [22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #15 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__3: 9323.67 us/iter [22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #16 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__6: 9690.43 us/iter [22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #17 fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948__1: 76.843 us/iter [22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #18 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__1: 70.4272 us/iter [22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #19 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__4: 596.825 us/iter [22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #20 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__7: 9047.68 us/iter [22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #21 fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089__1: 56.8034 us/iter [22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #22 fused_cast_23: 10.0938 us/iter [22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #23 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__5: 8854.5 us/iter [22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #24 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__8: 9212.74 us/iter [22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #25 fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948_: 14.1323 us/iter [22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #26 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588_: 93.6364 us/iter [22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #27 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__6: 843.468 us/iter [22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #28 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__9: 11918 us/iter [22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #29 fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089_: 56.1085 us/iter [22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #30 fused_cast_22: 10.012 us/iter [22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #31 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__7: 11729.8 us/iter [22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #32 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__10: 12051.1 us/iter [22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #33 fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast: 38.601 us/iter [22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #34 fused_nn_global_avg_pool2d_cast_multiply: 22.1764 us/iter [22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #35 fused_nn_batch_flatten_nn_batch_flatten_multiply: 9.9415 us/iter [22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #36 fused_nn_dense_nn_bias_add: 22.5578 us/iter Node Name Ops Time(us) Time(%) Shape Inputs Outputs --------- --- -------- ------- ----- ------ ------- fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__10 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__10 12051.1 10.119 (1, 512, 7, 7) 4 1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__9 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__9 11918.0 10.008 (1, 512, 7, 7) 4 1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__7 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__7 11729.8 9.85 (1, 512, 7, 7) 4 1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__6 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__6 9690.43 8.137 (1, 128, 28, 28) 4 1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__3 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__3 9323.67 7.829 (1, 128, 28, 28) 4 1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__8 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__8 9212.74 7.736 (1, 256, 14, 14) 4 1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__7 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__7 9047.68 7.597 (1, 256, 14, 14) 4 1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__5 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__5 9036.95 7.588 (1, 128, 28, 28) 4 1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__5 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__5 8854.5 7.435 (1, 256, 14, 14) 4 1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__4 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__4 5320.71 4.468 (1, 64, 56, 56) 4 1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036_ fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036_ 5281.79 4.435 (1, 64, 56, 56) 4 1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__3 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__3 5251.26 4.41 (1, 64, 56, 56) 4 1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__1 5161.39 4.334 (1, 64, 56, 56) 4 1 fused_nn_conv2d_multiply_add_nn_relu fused_nn_conv2d_multiply_add_nn_relu 4584.26 3.849 (1, 64, 112, 112) 4 1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__6 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__6 843.468 0.708 (1, 512, 7, 7) 4 1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__4 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__4 596.825 0.501 (1, 256, 14, 14) 4 1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__2 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__2 426.696 0.358 (1, 128, 28, 28) 4 1 fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948__2 fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948__2 107.187 0.09 (1, 64, 56, 56) 2 1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588_ fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588_ 93.636 0.079 (1, 512, 7, 7) 4 1 fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948__1 fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948__1 76.843 0.065 (1, 128, 28, 28) 2 1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__1 70.427 0.059 (1, 256, 14, 14) 4 1 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__2 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__2 59.811 0.05 (1, 128, 28, 28) 4 1 fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089__1 fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089__1 56.803 0.048 (1, 256, 14, 14) 2 1 fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089_ fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089_ 56.108 0.047 (1, 512, 7, 7) 2 1 fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast 38.601 0.032 (1, 512, 7, 7) 2 1 fused_nn_max_pool2d_1 fused_nn_max_pool2d_1 30.287 0.025 (1, 64, 56, 56) 1 1 fused_nn_dense_nn_bias_add fused_nn_dense_nn_bias_add 22.558 0.019 (1, 1000) 3 1 fused_nn_global_avg_pool2d_cast_multiply fused_nn_global_avg_pool2d_cast_multiply 22.176 0.019 (1, 512, 1, 1) 1 1 fused_multiply_round_clip_cast_cast_left_shift_multiply_add_right_shift_cast_add_2320814265661055830_ fused_multiply_round_clip_cast_cast_left_shift_multiply_add_right_shift_cast_add_2320814265661055830_ 19.225 0.016 (1, 64, 56, 56) 2 1 fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089__2 fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089__2 18.759 0.016 (1, 128, 28, 28) 2 1 fused_multiply_round_clip_cast fused_multiply_round_clip_cast 14.631 0.012 (1, 64, 56, 56) 1 1 fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948_ fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948_ 14.132 0.012 (1, 256, 14, 14) 2 1 fused_cast_24 fused_cast_24 13.572 0.011 (1, 128, 28, 28) 1 1 fused_cast_25 fused_cast_25 12.463 0.01 (1, 64, 56, 56) 1 1 fused_cast_23 fused_cast_23 10.094 0.008 (1, 256, 14, 14) 1 1 fused_cast_22 fused_cast_22 10.012 0.008 (1, 512, 7, 7) 1 1 fused_nn_batch_flatten_nn_batch_flatten_multiply fused_nn_batch_flatten_nn_batch_flatten_multiply 9.941 0.008 (1, 512) 1 1 Total_time - 119088.537 - - - - ``` Besides, the accuracy after the commit is close to zero on ILSVRC2012_img_val dataset.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
