niyue commented on issue #40024:
URL: https://github.com/apache/arrow/issues/40024#issuecomment-1937460521

   I added several micro benchmarks to verify the expression compilation 
performance (previous micro benchmarks primarily focus on execution performance 
instead of compilation performance).
   
   # all micro benchmarks
   
![image](https://github.com/apache/arrow/assets/27754/19759928-6c5a-44a5-8253-d0dc8612781e)
   * The first 6 micro benchmarks are about compilation performance. 
   * This PR is not expected to change execution performance of generated code, 
so besides the first 6, the remaining benchmarks are almost not changed.
   
   # The first 6 benchmarks
   
![image](https://github.com/apache/arrow/assets/27754/f21a4314-fb66-4d69-a9bb-3ee6d585376b)
   
   # The first 6 benchmarks (log scale)
   
![image](https://github.com/apache/arrow/assets/27754/6ed769bf-1559-474a-9c19-add61eba894e)
   
   # The detailed benchmark stats
   ### before optimization
   ```
   2024-02-11T15:09:54+08:00
   Running release/gandiva-micro-benchmarks
   Run on (10 X 24.1211 MHz CPU s)
   CPU Caches:
     L1 Data 64 KiB
     L1 Instruction 128 KiB
     L2 Unified 4096 KiB (x10)
   Load Average: 2.59, 5.07, 4.75
   /Users/ss/dev/projects/opensource/arrow/cpp/src/gandiva/cache.cc:50: 
Creating gandiva cache with capacity of 500
   /Users/ss/dev/projects/opensource/arrow/cpp/src/gandiva/engine.cc:276: 
Detected CPU Name : apple-m1
   /Users/ss/dev/projects/opensource/arrow/cpp/src/gandiva/engine.cc:277: 
Detected CPU Features: []
   
--------------------------------------------------------------------------------------
   Benchmark                                            Time             CPU   
Iterations
   
--------------------------------------------------------------------------------------
   TimedTestExprCompilationNoCache                  14760 us        14759 us    
       39
   TimedTestExprCompilationWithCache                  227 us          226 us    
     3094
   TimedTestNonBitcodeExprCompilationNoCache        13051 us        13047 us    
       46
   TimedTestNonBitcodeExprCompilationWithCache        238 us          238 us    
     2916
   TimedTestLiteralExprCompilationNoCache             227 us          227 us    
     2990
   TimedTestLiteralExprCompilationWithCache           230 us          230 us    
     3034
   TimedTestAdd3                                     1134 us         1128 us    
      635
   TimedTestBigNested                                7856 us         7854 us    
       88
   TimedTestExtractYear                              7183 us         7173 us    
       98
   TimedTestFilterAdd2                               2828 us         2828 us    
      249
   TimedTestFilterLike                              12836 us        12833 us    
       55
   TimedTestCastFloatFromString                     14497 us        14495 us    
       48
   TimedTestCastIntFromString                       14271 us        14271 us    
       49
   TimedTestAllocs                                  34164 us        34164 us    
       21
   TimedTestOutputStringAllocs                      51252 us        51230 us    
       14
   TimedTestMultiOr                                  9022 us         9022 us    
       78
   DecimalAdd2Fast                                   2054 us         2048 us    
      348
   DecimalAdd2LeadingZeroes                          5060 us         5059 us    
      138
   DecimalAdd2LeadingZeroesWithDiv                  23955 us        23948 us    
       29
   DecimalAdd2Large                                118613 us       118586 us    
        6
   DecimalAdd3Fast                                   2340 us         2332 us    
      304
   DecimalAdd3LeadingZeroes                          8752 us         8751 us    
       79
   DecimalAdd3LeadingZeroesWithDiv                  60829 us        60811 us    
       11
   DecimalAdd3Large                                241113 us       241100 us    
        3
   ```
   
   ### after optimization
   ```
   2024-02-11T15:11:43+08:00
   Running release/gandiva-micro-benchmarks
   Run on (10 X 24.1228 MHz CPU s)
   CPU Caches:
     L1 Data 64 KiB
     L1 Instruction 128 KiB
     L2 Unified 4096 KiB (x10)
   Load Average: 2.83, 4.38, 4.51
   /Users/ss/dev/projects/opensource/arrow/cpp/src/gandiva/cache.cc:50: 
Creating gandiva cache with capacity of 500
   /Users/ss/dev/projects/opensource/arrow/cpp/src/gandiva/engine.cc:273: 
Detected CPU Name : apple-m1
   /Users/ss/dev/projects/opensource/arrow/cpp/src/gandiva/engine.cc:274: 
Detected CPU Features: []
   
--------------------------------------------------------------------------------------
   Benchmark                                            Time             CPU   
Iterations
   
--------------------------------------------------------------------------------------
   TimedTestExprCompilationNoCache                  14382 us        14380 us    
       39
   TimedTestExprCompilationWithCache                 82.4 us         82.4 us    
     8394
   TimedTestNonBitcodeExprCompilationNoCache         1255 us         1255 us    
      499
   TimedTestNonBitcodeExprCompilationWithCache       90.6 us         90.6 us    
     7689
   TimedTestLiteralExprCompilationNoCache            82.1 us         82.1 us    
     8528
   TimedTestLiteralExprCompilationWithCache          85.6 us         85.6 us    
     8167
   TimedTestAdd3                                     1140 us         1133 us    
      599
   TimedTestBigNested                                7818 us         7817 us    
       89
   TimedTestExtractYear                              7187 us         7184 us    
       98
   TimedTestFilterAdd2                               2809 us         2809 us    
      249
   TimedTestFilterLike                              13097 us        13093 us    
       54
   TimedTestCastFloatFromString                     14168 us        14168 us    
       49
   TimedTestCastIntFromString                       14164 us        14159 us    
       49
   TimedTestAllocs                                  33802 us        33802 us    
       21
   TimedTestOutputStringAllocs                      50598 us        50592 us    
       13
   TimedTestMultiOr                                 11379 us        11378 us    
       63
   TimedTestInExpr                                   2509 us         2509 us    
      273
   DecimalAdd2Fast                                   2029 us         2029 us    
      340
   DecimalAdd2LeadingZeroes                          5153 us         5151 us    
      135
   DecimalAdd2LeadingZeroesWithDiv                  24197 us        24164 us    
       29
   DecimalAdd2Large                                118994 us       118917 us    
        6
   DecimalAdd3Fast                                   2281 us         2280 us    
      295
   DecimalAdd3LeadingZeroes                          8937 us         8935 us    
       78
   DecimalAdd3LeadingZeroesWithDiv                  60969 us        60966 us    
       11
   DecimalAdd3Large                                241916 us       241723 us    
        3
   ```
   
   # Conclusion
   * The `TimedTestExprCompilationNoCache` is slightly faster (around 2% 
faster) because the compilation is faster but the execution time still 
dominates this benchmark
   * The `TimedTestExprCompilationWithCache`, 
`TimedTestNonBitcodeExprCompilationWithCache` and 
`TimedTestLiteralExprCompilationWithCache` is faster primarily because we avoid 
loading the IR and C functions if cache is hit. They are around 2.5x faster.
   * The `TimedTestNonBitcodeExprCompilationNoCache` and 
`TimedTestLiteralExprCompilationNoCache` are 10x and 2.5x faster. For use cases 
where only C functions are used, such as `random()`, the compilation should be 
much faster since LLVM bitcode is not needed to be loaded and linked any more.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to