zhangxffff commented on PR #20694:
URL: https://github.com/apache/datafusion/pull/20694#issuecomment-4001877753

   > I'm curious if the `null_count` short-circuit helps in practice -- can you 
re-run the benchmarks when you get a chance?
   
   Benchmark result (before vs after vs after_null_count): 
   before: datafusion/main
   after: original patch
   after_null_count: patch with `null_count` guard
   
   For nulls=20% cases: after version showed ~3-5% regressions due to calling 
true_count() on every iteration. after_null_count eliminates this, matching 
before (e.g. list=28/match=100%/nulls=20%: 100.9µs vs 104.5µs).
   
   For the in_list_cols/Utf8 cases: the benchmark implicitly contains ~20% 
nulls, so the `null_count() == 0` similarly eliminates the regressions (e.g. 
Utf8/list=3/match=50%: 92.3µs vs 105.2µs in after).
   
   ```
   (zhangxffff) zhangxffff@95d3d60664da ~/W/datafusion ((bcc52cd4))> critcmp 
before after after_null_count
   group                                              after                     
             after_null_count                       before
   -----                                              -----                     
             ----------------                       ------
   in_list_cols/Int32/list=28/match=0%/nulls=0%       1.01     92.8±0.72µs      
  ? ?/sec    1.02     93.3±1.48µs        ? ?/sec    1.00     91.7±2.11µs        
? ?/sec
   in_list_cols/Int32/list=28/match=0%/nulls=20%      1.04    104.4±0.97µs      
  ? ?/sec    1.00    100.2±2.29µs        ? ?/sec    1.00    100.6±3.25µs        
? ?/sec
   in_list_cols/Int32/list=28/match=100%/nulls=0%     1.00      3.3±0.03µs      
  ? ?/sec    1.00      3.3±0.03µs        ? ?/sec    27.42    91.6±1.31µs        
? ?/sec
   in_list_cols/Int32/list=28/match=100%/nulls=20%    1.04    104.5±1.20µs      
  ? ?/sec    1.01    100.9±2.39µs        ? ?/sec    1.00    100.3±1.70µs        
? ?/sec
   in_list_cols/Int32/list=28/match=50%/nulls=0%      1.02     50.4±0.94µs      
  ? ?/sec    1.00     49.7±0.51µs        ? ?/sec    1.84     91.4±1.78µs        
? ?/sec
   in_list_cols/Int32/list=28/match=50%/nulls=20%     1.05    104.7±1.90µs      
  ? ?/sec    1.00     99.7±0.86µs        ? ?/sec    1.01    101.0±3.55µs        
? ?/sec
   in_list_cols/Int32/list=3/match=0%/nulls=0%        1.00      9.9±0.10µs      
  ? ?/sec    1.00      9.9±0.13µs        ? ?/sec    1.00      9.9±0.08µs        
? ?/sec
   in_list_cols/Int32/list=3/match=0%/nulls=20%       1.03     10.9±0.10µs      
  ? ?/sec    1.00     10.6±0.17µs        ? ?/sec    1.01     10.8±0.12µs        
? ?/sec
   in_list_cols/Int32/list=3/match=100%/nulls=0%      1.00      3.3±0.03µs      
  ? ?/sec    1.00      3.3±0.08µs        ? ?/sec    2.97      9.9±0.25µs        
? ?/sec
   in_list_cols/Int32/list=3/match=100%/nulls=20%     1.03     10.8±0.10µs      
  ? ?/sec    1.00     10.5±0.10µs        ? ?/sec    1.03     10.8±0.16µs        
? ?/sec
   in_list_cols/Int32/list=3/match=50%/nulls=0%       1.00      9.9±0.09µs      
  ? ?/sec    1.02     10.1±0.24µs        ? ?/sec    1.00      9.9±0.16µs        
? ?/sec
   in_list_cols/Int32/list=3/match=50%/nulls=20%      1.02     10.8±0.09µs      
  ? ?/sec    1.00     10.5±0.15µs        ? ?/sec    1.03     10.8±0.17µs        
? ?/sec
   in_list_cols/Int32/list=8/match=0%/nulls=0%        1.01     26.5±0.19µs      
  ? ?/sec    1.02     26.8±0.48µs        ? ?/sec    1.00     26.1±0.26µs        
? ?/sec
   in_list_cols/Int32/list=8/match=0%/nulls=20%       1.03     29.4±0.28µs      
  ? ?/sec    1.00     28.6±0.58µs        ? ?/sec    1.00     28.7±0.51µs        
? ?/sec
   in_list_cols/Int32/list=8/match=100%/nulls=0%      1.01      3.3±0.05µs      
  ? ?/sec    1.00      3.3±0.03µs        ? ?/sec    7.92     26.3±0.74µs        
? ?/sec
   in_list_cols/Int32/list=8/match=100%/nulls=20%     1.05     29.6±0.47µs      
  ? ?/sec    1.00     28.2±0.40µs        ? ?/sec    1.02     28.7±0.70µs        
? ?/sec
   in_list_cols/Int32/list=8/match=50%/nulls=0%       1.01     26.6±0.28µs      
  ? ?/sec    1.01     26.6±0.53µs        ? ?/sec    1.00     26.3±0.39µs        
? ?/sec
   in_list_cols/Int32/list=8/match=50%/nulls=20%      1.03     29.5±0.36µs      
  ? ?/sec    1.00     28.6±0.55µs        ? ?/sec    1.00     28.5±0.28µs        
? ?/sec
   in_list_cols/Utf8/list=28/match=0%                 1.18    157.0±3.90µs      
  ? ?/sec    1.10    146.1±2.62µs        ? ?/sec    1.00    132.7±2.97µs        
? ?/sec
   in_list_cols/Utf8/list=28/match=100%               1.09    722.5±9.38µs      
  ? ?/sec    1.00    665.8±9.49µs        ? ?/sec    1.08    722.0±6.94µs        
? ?/sec
   in_list_cols/Utf8/list=28/match=50%                1.01  1068.6±16.04µs      
  ? ?/sec    1.01  1064.5±19.03µs        ? ?/sec    1.00  1053.9±14.52µs        
? ?/sec
   in_list_cols/Utf8/list=3/match=0%                  1.14     16.2±0.38µs      
  ? ?/sec    1.07     15.3±0.28µs        ? ?/sec    1.00     14.2±0.24µs        
? ?/sec
   in_list_cols/Utf8/list=3/match=100%                1.03     67.7±1.22µs      
  ? ?/sec    1.00     65.6±0.80µs        ? ?/sec    1.03     67.8±2.21µs        
? ?/sec
   in_list_cols/Utf8/list=3/match=50%                 1.14    105.2±1.65µs      
  ? ?/sec    1.00     92.3±1.64µs        ? ?/sec    1.04     96.3±5.61µs        
? ?/sec
   in_list_cols/Utf8/list=8/match=0%                  1.19     44.9±1.11µs      
  ? ?/sec    1.09     41.0±0.64µs        ? ?/sec    1.00     37.7±0.87µs        
? ?/sec
   in_list_cols/Utf8/list=8/match=100%                1.01    194.3±2.14µs      
  ? ?/sec    1.00    191.7±2.73µs        ? ?/sec    1.02    195.9±2.36µs        
? ?/sec
   in_list_cols/Utf8/list=8/match=50%                 1.02    294.0±2.76µs      
  ? ?/sec    1.00    287.3±2.73µs        ? ?/sec    1.02    294.0±3.57µs        
? ?/sec
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to