shingjan commented on PR #12141:
URL: https://github.com/apache/tvm/pull/12141#issuecomment-1197832847

   bert base cuda:
   ```
    ID |                                                              Name |    
  FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | 
Trials | Terminated 
   
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
     0 |                                                        fused_take |    
     1 |      1 |         0.0005 |       2.1319 |                2.1319 |      
5 |            
     1 |                                      fused_nn_dense_add_fast_tanh |   
1204224 |      1 |        36.6140 |      32.8897 |               32.8897 |     
32 |          Y 
     2 |                       fused_reshape_add_reshape_transpose_reshape |    
 49152 |     12 |        13.5008 |       3.6407 |               43.6879 |      
6 |          Y 
     3 |                                                    fused_variance |    
147520 |     25 |        65.9260 |       2.2377 |               55.9415 |     
32 |          Y 
     4 |                                                        fused_mean |    
 49216 |     25 |        21.9872 |       2.2384 |               55.9597 |     
32 |          Y 
     5 |                                               fused_cast_take_add |    
 49152 |      1 |        20.9740 |       2.3435 |                2.3435 |      
6 |            
     6 |                     fused_reshape_add_reshape_transpose_reshape_1 |    
 49152 |     24 |        20.6382 |       2.3816 |               57.1585 |      
6 |          Y 
     7 |                                          fused_reshape_divide_add |    
 98304 |     12 |        43.8752 |       2.2405 |               26.8864 |      
6 |          Y 
     8 |                                             fused_nn_fast_softmax |   
4374528 |     12 |      1141.5252 |       3.8322 |               45.9861 |     
32 |          Y 
     9 |                                                     fused_reshape |    
     1 |     12 |         0.0005 |       2.1836 |               26.2035 |      
6 |          Y 
    10 |                                             fused_nn_batch_matmul |   
6291456 |     24 |       684.4451 |       9.1921 |              220.6093 |     
32 |          Y 
    11 |                                   fused_reshape_transpose_reshape |    
     1 |     12 |         0.0005 |       2.1763 |               26.1151 |      
6 |          Y 
    12 |                                                    fused_nn_dense |  
75497472 |     48 |       918.1956 |      82.2237 |             3946.7393 |     
32 |          Y 
    13 |                                                   fused_reshape_1 |    
     1 |     24 |         0.0005 |       2.1895 |               52.5487 |      
6 |          Y 
    14 |                                                  fused_nn_dense_1 | 
301989888 |     12 |      2381.8300 |     126.7890 |             1521.4682 |    
 32 |          Y 
    15 | fused_reshape_add_multiply_fast_erf_multiply_add_multiply_reshape |  
15532032 |     12 |      4892.7944 |       3.1745 |               38.0936 |     
 6 |          Y 
    16 |                                                  fused_nn_dense_2 | 
301989888 |     12 |      1758.6493 |     171.7170 |             2060.6034 |    
 32 |          Y 
    17 |                                             fused_reshape_add_add |    
 98304 |     24 |        39.4395 |       2.4925 |               59.8207 |      
6 |          Y 
    18 |                       fused_subtract_add_sqrt_divide_multiply_add |    
196672 |     25 |        72.4898 |       2.7131 |               67.8275 |      
6 |          Y 
   
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   
   ```
   profiler table:
   ```
    ID |                                        Name | Time (min) | Percentage 
   ----------------------------------------------------------------------------
       |                                       Total |    22.2403 |   100.0000 
     1 |     EvoSearch/Evolve/PredictNormalizedScore |     9.5203 |    42.8065 
     2 |                   EvoSearch/Evolve/Mutation |     3.3615 |    15.1146 
     3 |                               SendToBuilder |     2.3562 |    10.5943 
     4 |              EvoSearch/SampleInitPopulation |     2.3124 |    10.3975 
     5 |                       EvoSearch/Evolve/Misc |     2.1767 |     9.7870 
     6 |                                SendToRunner |     1.6900 |     7.5987 
     7 |                            ApplyHistoryBest |     0.3483 |     1.5662 
     8 |                              TaskExtraction |     0.2121 |     0.9535 
     9 |             MeasureCallback/UpdateCostModel |     0.0500 |     0.2248 
    10 |              EvoSearch/PickBestFromDatabase |     0.0158 |     0.0710 
    11 |                              InitializeTask |     0.0095 |     0.0429 
    12 |                 EvoSearch/PickWithEpsGreedy |     0.0069 |     0.0310 
    13 |               MeasureCallback/AddToDatabase |     0.0029 |     0.0130 
    14 |         MeasureCallback/RemoveBuildArtifact |     0.0008 |     0.0037 
    15 |              MeasureCallback/EchoStatistics |     0.0006 |     0.0028 
    16 |                           JoinRunnerFutures |     0.0003 |     0.0012 
    17 | EvoSearch/Evolve/Misc/CopyMeasuredWorkloads |     0.0000 |     0.0000 
   ----------------------------------------------------------------------------
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to