shingjan commented on PR #12141:
URL: https://github.com/apache/tvm/pull/12141#issuecomment-1198305464
bert base on llvm 20k trials:
```
ID | Name |
FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) |
Trials | Terminated
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
0 | fused_take |
1 | 1 | 0.0001 | 12.9686 | 12.9686 |
1 | Y
1 | fused_nn_dense_add_fast_tanh |
1204224 | 1 | 84.5479 | 14.2431 | 14.2431 |
32 | Y
2 | fused_reshape_add_reshape_transpose_reshape |
49152 | 12 | 5.3101 | 9.2562 | 111.0749 |
1 | Y
3 | fused_variance |
147520 | 25 | 21.8394 | 6.7548 | 168.8690 |
191 | Y
4 | fused_mean |
49216 | 25 | 11.7478 | 4.1894 | 104.7344 |
159 | Y
5 | fused_cast_take_add |
49152 | 1 | 3.6734 | 13.3805 | 13.3805 |
2 | Y
6 | fused_reshape_add_reshape_transpose_reshape_1 |
49152 | 24 | 0.4843 | 101.4931 | 2435.8337 |
1 | Y
7 | fused_reshape_divide_add |
98304 | 12 | 12.6803 | 7.7525 | 93.0296 |
2 | Y
8 | fused_nn_fast_softmax |
4374528 | 12 | 207.0953 | 21.1233 | 253.4791 |
288 | Y
9 | fused_reshape |
1 | 12 | 0.0001 | 12.0269 | 144.3223 |
1 | Y
10 | fused_nn_batch_matmul |
6291456 | 24 | 462.0523 | 13.6163 | 326.7919 |
384 | Y
11 | fused_reshape_transpose_reshape |
1 | 12 | 0.0000 | 66.8140 | 801.7686 |
1 | Y
12 | fused_nn_dense |
75497472 | 48 | 613.1287 | 123.1348 | 5910.4700 |
6656 |
13 | fused_reshape_1 |
1 | 24 | 0.0000 | 49.1952 | 1180.6855 |
1 | Y
14 | fused_nn_dense_1 |
301989888 | 12 | 664.1287 | 454.7159 | 5456.5913 |
6144 |
15 | fused_reshape_add_multiply_fast_erf_multiply_add_multiply_reshape |
15532032 | 12 | 32.6868 | 475.1782 | 5702.1385 |
1 | Y
16 | fused_nn_dense_2 |
301989888 | 12 | 662.0116 | 456.1701 | 5474.0410 |
6144 |
17 | fused_reshape_add_add |
98304 | 24 | 1.3333 | 73.7283 | 1769.4793 |
2 | Y
18 | fused_subtract_add_sqrt_divide_multiply_add |
196672 | 25 | 2.6162 | 75.1739 | 1879.3469 |
2 | Y
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total trials: 20013
Total latency (us): 31853.2
```
profiler table
```
ID | Name | Time (min) | Percentage
----------------------------------------------------------------------------
| Total | 359.8455 | 100.0000
1 | SendToRunner | 118.7806 | 33.0088
2 | EvoSearch/Evolve/PredictNormalizedScore | 62.0087 | 17.2320
3 | SendToBuilder | 56.9247 | 15.8192
4 | MeasureCallback/UpdateCostModel | 42.1284 | 11.7074
5 | EvoSearch/Evolve/Mutation | 40.9665 | 11.3845
6 | EvoSearch/Evolve/Misc | 21.9481 | 6.0993
7 | EvoSearch/SampleInitPopulation | 7.9898 | 2.2203
8 | EvoSearch/PickBestFromDatabase | 2.4416 | 0.6785
9 | ApplyHistoryBest | 0.5137 | 0.1428
10 | MeasureCallback/AddToDatabase | 0.1833 | 0.0509
11 | TaskExtraction | 0.1798 | 0.0500
12 | EvoSearch/PickWithEpsGreedy | 0.0540 | 0.0150
13 | MeasureCallback/RemoveBuildArtifact | 0.0453 | 0.0126
14 | InitializeTask | 0.0440 | 0.0122
15 | MeasureCallback/EchoStatistics | 0.0310 | 0.0086
16 | JoinRunnerFutures | 0.0118 | 0.0033
17 | EvoSearch/Evolve/Misc/CopyMeasuredWorkloads | 0.0116 | 0.0032
----------------------------------------------------------------------------
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]