shingjan commented on PR #12141:
URL: https://github.com/apache/tvm/pull/12141#issuecomment-1197832847
bert base cuda:
```
ID | Name |
FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) |
Trials | Terminated
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
0 | fused_take |
1 | 1 | 0.0005 | 2.1319 | 2.1319 |
5 |
1 | fused_nn_dense_add_fast_tanh |
1204224 | 1 | 36.6140 | 32.8897 | 32.8897 |
32 | Y
2 | fused_reshape_add_reshape_transpose_reshape |
49152 | 12 | 13.5008 | 3.6407 | 43.6879 |
6 | Y
3 | fused_variance |
147520 | 25 | 65.9260 | 2.2377 | 55.9415 |
32 | Y
4 | fused_mean |
49216 | 25 | 21.9872 | 2.2384 | 55.9597 |
32 | Y
5 | fused_cast_take_add |
49152 | 1 | 20.9740 | 2.3435 | 2.3435 |
6 |
6 | fused_reshape_add_reshape_transpose_reshape_1 |
49152 | 24 | 20.6382 | 2.3816 | 57.1585 |
6 | Y
7 | fused_reshape_divide_add |
98304 | 12 | 43.8752 | 2.2405 | 26.8864 |
6 | Y
8 | fused_nn_fast_softmax |
4374528 | 12 | 1141.5252 | 3.8322 | 45.9861 |
32 | Y
9 | fused_reshape |
1 | 12 | 0.0005 | 2.1836 | 26.2035 |
6 | Y
10 | fused_nn_batch_matmul |
6291456 | 24 | 684.4451 | 9.1921 | 220.6093 |
32 | Y
11 | fused_reshape_transpose_reshape |
1 | 12 | 0.0005 | 2.1763 | 26.1151 |
6 | Y
12 | fused_nn_dense |
75497472 | 48 | 918.1956 | 82.2237 | 3946.7393 |
32 | Y
13 | fused_reshape_1 |
1 | 24 | 0.0005 | 2.1895 | 52.5487 |
6 | Y
14 | fused_nn_dense_1 |
301989888 | 12 | 2381.8300 | 126.7890 | 1521.4682 |
32 | Y
15 | fused_reshape_add_multiply_fast_erf_multiply_add_multiply_reshape |
15532032 | 12 | 4892.7944 | 3.1745 | 38.0936 |
6 | Y
16 | fused_nn_dense_2 |
301989888 | 12 | 1758.6493 | 171.7170 | 2060.6034 |
32 | Y
17 | fused_reshape_add_add |
98304 | 24 | 39.4395 | 2.4925 | 59.8207 |
6 | Y
18 | fused_subtract_add_sqrt_divide_multiply_add |
196672 | 25 | 72.4898 | 2.7131 | 67.8275 |
6 | Y
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
```
profiler table:
```
ID | Name | Time (min) | Percentage
----------------------------------------------------------------------------
| Total | 22.2403 | 100.0000
1 | EvoSearch/Evolve/PredictNormalizedScore | 9.5203 | 42.8065
2 | EvoSearch/Evolve/Mutation | 3.3615 | 15.1146
3 | SendToBuilder | 2.3562 | 10.5943
4 | EvoSearch/SampleInitPopulation | 2.3124 | 10.3975
5 | EvoSearch/Evolve/Misc | 2.1767 | 9.7870
6 | SendToRunner | 1.6900 | 7.5987
7 | ApplyHistoryBest | 0.3483 | 1.5662
8 | TaskExtraction | 0.2121 | 0.9535
9 | MeasureCallback/UpdateCostModel | 0.0500 | 0.2248
10 | EvoSearch/PickBestFromDatabase | 0.0158 | 0.0710
11 | InitializeTask | 0.0095 | 0.0429
12 | EvoSearch/PickWithEpsGreedy | 0.0069 | 0.0310
13 | MeasureCallback/AddToDatabase | 0.0029 | 0.0130
14 | MeasureCallback/RemoveBuildArtifact | 0.0008 | 0.0037
15 | MeasureCallback/EchoStatistics | 0.0006 | 0.0028
16 | JoinRunnerFutures | 0.0003 | 0.0012
17 | EvoSearch/Evolve/Misc/CopyMeasuredWorkloads | 0.0000 | 0.0000
----------------------------------------------------------------------------
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]