shingjan commented on PR #12141:
URL: https://github.com/apache/tvm/pull/12141#issuecomment-1198305464

   bert base on llvm 20k trials:
   ```
    ID |                                                              Name |    
  FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | 
Trials | Terminated 
   
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
     0 |                                                        fused_take |    
     1 |      1 |         0.0001 |      12.9686 |               12.9686 |      
1 |          Y 
     1 |                                      fused_nn_dense_add_fast_tanh |   
1204224 |      1 |        84.5479 |      14.2431 |               14.2431 |     
32 |          Y 
     2 |                       fused_reshape_add_reshape_transpose_reshape |    
 49152 |     12 |         5.3101 |       9.2562 |              111.0749 |      
1 |          Y 
     3 |                                                    fused_variance |    
147520 |     25 |        21.8394 |       6.7548 |              168.8690 |    
191 |          Y 
     4 |                                                        fused_mean |    
 49216 |     25 |        11.7478 |       4.1894 |              104.7344 |    
159 |          Y 
     5 |                                               fused_cast_take_add |    
 49152 |      1 |         3.6734 |      13.3805 |               13.3805 |      
2 |          Y 
     6 |                     fused_reshape_add_reshape_transpose_reshape_1 |    
 49152 |     24 |         0.4843 |     101.4931 |             2435.8337 |      
1 |          Y 
     7 |                                          fused_reshape_divide_add |    
 98304 |     12 |        12.6803 |       7.7525 |               93.0296 |      
2 |          Y 
     8 |                                             fused_nn_fast_softmax |   
4374528 |     12 |       207.0953 |      21.1233 |              253.4791 |    
288 |          Y 
     9 |                                                     fused_reshape |    
     1 |     12 |         0.0001 |      12.0269 |              144.3223 |      
1 |          Y 
    10 |                                             fused_nn_batch_matmul |   
6291456 |     24 |       462.0523 |      13.6163 |              326.7919 |    
384 |          Y 
    11 |                                   fused_reshape_transpose_reshape |    
     1 |     12 |         0.0000 |      66.8140 |              801.7686 |      
1 |          Y 
    12 |                                                    fused_nn_dense |  
75497472 |     48 |       613.1287 |     123.1348 |             5910.4700 |   
6656 |            
    13 |                                                   fused_reshape_1 |    
     1 |     24 |         0.0000 |      49.1952 |             1180.6855 |      
1 |          Y 
    14 |                                                  fused_nn_dense_1 | 
301989888 |     12 |       664.1287 |     454.7159 |             5456.5913 |   
6144 |            
    15 | fused_reshape_add_multiply_fast_erf_multiply_add_multiply_reshape |  
15532032 |     12 |        32.6868 |     475.1782 |             5702.1385 |     
 1 |          Y 
    16 |                                                  fused_nn_dense_2 | 
301989888 |     12 |       662.0116 |     456.1701 |             5474.0410 |   
6144 |            
    17 |                                             fused_reshape_add_add |    
 98304 |     24 |         1.3333 |      73.7283 |             1769.4793 |      
2 |          Y 
    18 |                       fused_subtract_add_sqrt_divide_multiply_add |    
196672 |     25 |         2.6162 |      75.1739 |             1879.3469 |      
2 |          Y 
   
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Total trials: 20013
   Total latency (us): 31853.2
   ```
   profiler table
   ```
    ID |                                        Name | Time (min) | Percentage 
   ----------------------------------------------------------------------------
       |                                       Total |   359.8455 |   100.0000 
     1 |                                SendToRunner |   118.7806 |    33.0088 
     2 |     EvoSearch/Evolve/PredictNormalizedScore |    62.0087 |    17.2320 
     3 |                               SendToBuilder |    56.9247 |    15.8192 
     4 |             MeasureCallback/UpdateCostModel |    42.1284 |    11.7074 
     5 |                   EvoSearch/Evolve/Mutation |    40.9665 |    11.3845 
     6 |                       EvoSearch/Evolve/Misc |    21.9481 |     6.0993 
     7 |              EvoSearch/SampleInitPopulation |     7.9898 |     2.2203 
     8 |              EvoSearch/PickBestFromDatabase |     2.4416 |     0.6785 
     9 |                            ApplyHistoryBest |     0.5137 |     0.1428 
    10 |               MeasureCallback/AddToDatabase |     0.1833 |     0.0509 
    11 |                              TaskExtraction |     0.1798 |     0.0500 
    12 |                 EvoSearch/PickWithEpsGreedy |     0.0540 |     0.0150 
    13 |         MeasureCallback/RemoveBuildArtifact |     0.0453 |     0.0126 
    14 |                              InitializeTask |     0.0440 |     0.0122 
    15 |              MeasureCallback/EchoStatistics |     0.0310 |     0.0086 
    16 |                           JoinRunnerFutures |     0.0118 |     0.0033 
    17 | EvoSearch/Evolve/Misc/CopyMeasuredWorkloads |     0.0116 |     0.0032 
   ----------------------------------------------------------------------------
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to