shingjan commented on PR #12141:
URL: https://github.com/apache/tvm/pull/12141#issuecomment-1198304639
mobilenetv2 on cuda
```
ID | Name | FLOP | Weight | Speed
(GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Terminated
-----------------------------------------------------------------------------------------------------------------------------------------------
0 | fused_layout_transform | 1 | 1 |
0.0004 | 2.2798 | 2.2798 | 6 |
1 | fused_nn_conv2d_add_clip | 22880256 | 1 |
3151.3187 | 7.2605 | 7.2605 | 32 | Y
2 | fused_nn_conv2d_add_clip_1 | 8429568 | 1 |
1285.2570 | 6.5587 | 6.5587 | 32 | Y
3 | fused_nn_conv2d_add | 13045760 | 1 |
2104.8376 | 6.1980 | 6.1980 | 32 | Y
4 | fused_nn_conv2d_add_clip_2 | 42147840 | 1 |
2994.8494 | 14.0734 | 14.0734 | 32 | Y
5 | fused_nn_conv2d_add_clip_3 | 6322176 | 1 |
682.4610 | 9.2638 | 9.2638 | 32 | Y
6 | fused_nn_conv2d_add_1 | 14525952 | 1 |
1936.7547 | 7.5002 | 7.5002 | 32 | Y
7 | fused_nn_conv2d_add_clip_4 | 9483264 | 1 |
1537.1009 | 6.1696 | 6.1696 | 32 | Y
8 | fused_nn_conv2d_add_add | 21826560 | 1 |
2005.0549 | 10.8858 | 10.8858 | 32 | Y
9 | fused_nn_conv2d_add_clip_5 | 23030784 | 2 |
1914.2627 | 12.0312 | 24.0623 | 32 | Y
10 | fused_nn_conv2d_add_clip_6 | 2370816 | 1 |
393.0634 | 6.0316 | 6.0316 | 32 | Y
11 | fused_nn_conv2d_add_2 | 7250432 | 1 |
917.3106 | 7.9040 | 7.9040 | 32 | Y
12 | fused_nn_conv2d_add_clip_7 | 3161088 | 2 |
262.2023 | 12.0559 | 24.1118 | 32 | Y
13 | fused_nn_conv2d_add_add_1 | 9683968 | 2 |
1061.2357 | 9.1252 | 18.2504 | 32 | Y
14 | fused_nn_conv2d_add_clip_8 | 10085376 | 3 |
737.1134 | 13.6823 | 41.0468 | 32 | Y
15 | fused_nn_conv2d_add_clip_9 | 790272 | 1 |
170.2160 | 4.6428 | 4.6428 | 32 | Y
16 | fused_nn_conv2d_add_3 | 4829440 | 1 |
957.4766 | 5.0439 | 5.0439 | 32 | Y
17 | fused_nn_conv2d_add_add_2 | 9658880 | 3 |
919.5057 | 10.5044 | 31.5133 | 32 | Y
18 | fused_nn_conv2d_add_clip_10 | 9859584 | 4 |
1410.7424 | 6.9889 | 27.9557 | 32 | Y
19 | fused_nn_conv2d_add_clip_11 | 1580544 | 4 |
361.8447 | 4.3680 | 17.4721 | 32 | Y
20 | fused_nn_conv2d_add_4 | 14469504 | 1 |
739.5858 | 19.5643 | 19.5643 | 32 | Y
21 | fused_nn_conv2d_add_clip_12 | 2370816 | 2 |
503.2051 | 4.7114 | 9.4229 | 32 | Y
22 | fused_nn_conv2d_add_add_3 | 21713664 | 2 |
1405.4021 | 15.4501 | 30.9003 | 32 | Y
23 | fused_nn_conv2d_add_clip_13 | 22014720 | 3 |
2486.7910 | 8.8527 | 26.5580 | 32 | Y
24 | fused_nn_conv2d_add_clip_14 | 592704 | 1 |
125.2444 | 4.7324 | 4.7324 | 32 | Y
25 | fused_nn_conv2d_add_5 | 9039520 | 1 |
410.3605 | 22.0282 | 22.0282 | 32 | Y
26 | fused_nn_conv2d_add_add_4 | 15068480 | 2 |
411.3220 | 36.6343 | 73.2685 | 32 | Y
27 | fused_nn_conv2d_add_clip_15 | 15193920 | 3 |
1503.2292 | 10.1075 | 30.3226 | 32 | Y
28 | fused_nn_conv2d_add_clip_16 | 987840 | 3 |
224.3443 | 4.4032 | 13.2097 | 32 | Y
29 | fused_nn_conv2d_add_6 | 30121280 | 1 |
1749.6604 | 17.2155 | 17.2155 | 32 | Y
30 | fused_nn_conv2d_add_clip_17 | 40328960 | 1 |
2609.1046 | 15.4570 | 15.4570 | 32 | Y
31 | fused_nn_adaptive_avg_pool2d | 64000 | 1 |
16.9965 | 3.7655 | 3.7655 | 32 | Y
32 | fused_layout_transform_reshape_squeeze | 1 | 1 |
0.0002 | 4.3296 | 4.3296 | 6 | Y
33 | fused_nn_dense_add | 2561000 | 1 |
66.7119 | 38.3890 | 38.3890 | 32 | Y
-----------------------------------------------------------------------------------------------------------------------------------------------
```
profiler table
```
ID | Name | Time (min) | Percentage
----------------------------------------------------------------------------
| Total | 82.0160 | 100.0000
1 | EvoSearch/Evolve/Mutation | 42.1468 | 51.3885
2 | SendToBuilder | 15.1365 | 18.4556
3 | EvoSearch/SampleInitPopulation | 7.9139 | 9.6492
4 | SendToRunner | 6.4504 | 7.8648
5 | EvoSearch/Evolve/PredictNormalizedScore | 2.7957 | 3.4087
6 | MeasureCallback/UpdateCostModel | 2.7350 | 3.3348
7 | EvoSearch/Evolve/Misc | 2.6240 | 3.1994
8 | ApplyHistoryBest | 1.1672 | 1.4232
9 | TaskExtraction | 0.4503 | 0.5490
10 | InitializeTask | 0.0250 | 0.0304
11 | MeasureCallback/AddToDatabase | 0.0198 | 0.0241
12 | EvoSearch/PickWithEpsGreedy | 0.0100 | 0.0122
13 | MeasureCallback/RemoveBuildArtifact | 0.0034 | 0.0041
14 | EvoSearch/PickBestFromDatabase | 0.0032 | 0.0039
15 | MeasureCallback/EchoStatistics | 0.0027 | 0.0033
16 | JoinRunnerFutures | 0.0013 | 0.0016
17 | EvoSearch/Evolve/Misc/CopyMeasuredWorkloads | 0.0000 | 0.0000
----------------------------------------------------------------------------
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]