Re: [PR] perf: Use batched row conversion for `array_has_any`, `array_has_all` [datafusion]

via GitHub Tue, 03 Mar 2026 14:12:40 -0800


neilconway commented on PR #20588:
URL: https://github.com/apache/datafusion/pull/20588#issuecomment-3993891789


   Alright, I implemented a variant where we do row conversion in chunks of 256 
rows. Here are the results on the Hertzner box:
   
   ```
     group                                          base                        
           target
     -----                                          ----                        
           ------
     array_has_all/all_found_small_needle/10        4.81      6.8±0.04ms        
? ?/sec    1.00  1422.9±33.96µs        ? ?/sec
     array_has_all/all_found_small_needle/100       1.62     16.6±0.04ms        
? ?/sec    1.00     10.2±0.03ms        ? ?/sec
     array_has_all/all_found_small_needle/500       1.19     59.4±0.09ms        
? ?/sec    1.00     49.8±0.12ms        ? ?/sec
     array_has_all/not_all_found/10                 5.85      6.5±0.03ms        
? ?/sec    1.00   1115.8±9.24µs        ? ?/sec
     array_has_all/not_all_found/100                1.71     15.0±0.05ms        
? ?/sec    1.00      8.8±0.03ms        ? ?/sec
     array_has_all/not_all_found/500                1.22     52.5±0.11ms        
? ?/sec    1.00     43.0±0.09ms        ? ?/sec
     array_has_all_strings/all_found/10             2.71      5.3±0.03ms        
? ?/sec    1.00   1948.9±7.79µs        ? ?/sec
     array_has_all_strings/all_found/100            1.43     15.8±0.04ms        
? ?/sec    1.00     11.1±0.04ms        ? ?/sec
     array_has_all_strings/all_found/500            1.18     61.0±0.14ms        
? ?/sec    1.00     51.6±0.62ms        ? ?/sec
     array_has_all_strings/not_all_found/10         3.05      4.1±0.02ms        
? ?/sec    1.00  1338.3±65.23µs        ? ?/sec
     array_has_all_strings/not_all_found/100        1.48     14.2±0.08ms        
? ?/sec    1.00      9.6±0.05ms        ? ?/sec
     array_has_all_strings/not_all_found/500        1.23     75.4±0.17ms        
? ?/sec    1.00     61.2±0.19ms        ? ?/sec
     array_has_any/no_match/10                      3.46      7.8±0.05ms        
? ?/sec    1.00      2.2±0.01ms        ? ?/sec
     array_has_any/no_match/100                     1.35     25.3±0.11ms        
? ?/sec    1.00     18.7±0.03ms        ? ?/sec
     array_has_any/no_match/500                     1.14    105.4±0.13ms        
? ?/sec    1.00     92.8±2.97ms        ? ?/sec
     array_has_any/scalar_no_match/10               1.11      2.4±0.01ms        
? ?/sec    1.00      2.2±0.01ms        ? ?/sec
     array_has_any/scalar_no_match/100              1.10     22.9±0.06ms        
? ?/sec    1.00     20.8±0.06ms        ? ?/sec
     array_has_any/scalar_no_match/500              1.06    148.5±0.64ms        
? ?/sec    1.00    140.2±1.91ms        ? ?/sec
     array_has_any/scalar_some_match/10             1.07   1133.4±3.89µs        
? ?/sec    1.00   1061.6±4.64µs        ? ?/sec
     array_has_any/scalar_some_match/100            1.04     11.6±0.16ms        
? ?/sec    1.00     11.2±0.08ms        ? ?/sec
     array_has_any/scalar_some_match/500            1.05     90.9±0.71ms        
? ?/sec    1.00     87.0±0.88ms        ? ?/sec
     array_has_any/some_match/10                    5.26      6.6±0.05ms        
? ?/sec    1.00   1264.5±3.59µs        ? ?/sec
     array_has_any/some_match/100                   1.60     15.7±0.08ms        
? ?/sec    1.00      9.8±0.03ms        ? ?/sec
     array_has_any/some_match/500                   1.17     55.9±0.20ms        
? ?/sec    1.00     47.8±0.33ms        ? ?/sec
     array_has_any_scalar/i64_no_match/1            1.06    396.6±2.17µs        
? ?/sec    1.00    372.8±3.30µs        ? ?/sec
     array_has_any_scalar/i64_no_match/10           1.01    449.7±8.66µs        
? ?/sec    1.00   446.0±10.76µs        ? ?/sec
     array_has_any_scalar/i64_no_match/100          1.02   639.2±20.48µs        
? ?/sec    1.00   628.6±17.24µs        ? ?/sec
     array_has_any_scalar/i64_no_match/1000         1.00   545.1±10.73µs        
? ?/sec    1.00   544.1±13.21µs        ? ?/sec
     array_has_any_scalar/string_no_match/1         1.00    250.5±2.16µs        
? ?/sec    1.03    257.9±8.09µs        ? ?/sec
     array_has_any_scalar/string_no_match/10        1.00    418.3±6.45µs        
? ?/sec    1.00    419.4±6.58µs        ? ?/sec
     array_has_any_scalar/string_no_match/100       1.00   544.9±22.43µs        
? ?/sec    1.01   550.0±24.24µs        ? ?/sec
     array_has_any_scalar/string_no_match/1000      1.00    457.7±8.87µs        
? ?/sec    1.00    459.1±6.78µs        ? ?/sec
     array_has_any_strings/no_match/10              2.12      5.2±0.02ms        
? ?/sec    1.00      2.4±0.01ms        ? ?/sec
     array_has_any_strings/no_match/100             1.21     22.5±0.07ms        
? ?/sec    1.00     18.6±0.20ms        ? ?/sec
     array_has_any_strings/no_match/500             1.11    141.5±0.18ms        
? ?/sec    1.00    127.2±0.39ms        ? ?/sec
     array_has_any_strings/scalar_no_match/10       1.00    861.4±1.90µs        
? ?/sec    1.06    909.8±1.83µs        ? ?/sec
     array_has_any_strings/scalar_no_match/100      1.00      7.4±0.06ms        
? ?/sec    1.08      8.0±0.14ms        ? ?/sec
     array_has_any_strings/scalar_no_match/500      1.02     93.9±0.13ms        
? ?/sec    1.00     91.7±0.23ms        ? ?/sec
     array_has_any_strings/scalar_some_match/10     1.05    827.3±3.93µs        
? ?/sec    1.00    788.8±3.78µs        ? ?/sec
     array_has_any_strings/scalar_some_match/100    1.01      5.2±0.17ms        
? ?/sec    1.00      5.1±0.14ms        ? ?/sec
     array_has_any_strings/scalar_some_match/500    1.00     17.7±0.11ms        
? ?/sec    1.04     18.5±0.15ms        ? ?/sec
     array_has_any_strings/some_match/10            2.56      4.5±0.01ms        
? ?/sec    1.00   1758.6±7.71µs        ? ?/sec
     array_has_any_strings/some_match/100           1.36     14.4±0.07ms        
? ?/sec    1.00     10.6±0.06ms        ? ?/sec
     array_has_any_strings/some_match/500           1.10     54.9±1.41ms        
? ?/sec    1.00     50.1±0.20ms        ? ?/sec
     array_has_i64/found/10                         1.00    144.9±4.94µs        
? ?/sec    1.02    147.7±4.93µs        ? ?/sec
     array_has_i64/found/100                        1.00   570.5±31.30µs        
? ?/sec    1.06   605.6±35.62µs        ? ?/sec
     array_has_i64/found/500                        1.00      4.4±0.15ms        
? ?/sec    1.02      4.5±0.12ms        ? ?/sec
     array_has_i64/not_found/10                     1.03     68.8±0.44µs        
? ?/sec    1.00     67.0±1.26µs        ? ?/sec
     array_has_i64/not_found/100                    1.02   471.6±27.43µs        
? ?/sec    1.00   462.7±22.65µs        ? ?/sec
     array_has_i64/not_found/500                    1.00      4.5±0.11ms        
? ?/sec    1.00      4.5±0.11ms        ? ?/sec
     array_has_strings/found/10                     1.10    744.8±5.29µs        
? ?/sec    1.00    679.9±5.94µs        ? ?/sec
     array_has_strings/found/100                    1.00      2.7±0.03ms        
? ?/sec    1.00      2.7±0.04ms        ? ?/sec
     array_has_strings/found/500                    1.00     15.6±0.21ms        
? ?/sec    1.05     16.3±0.35ms        ? ?/sec
     array_has_strings/not_found/10                 1.02    150.5±0.36µs        
? ?/sec    1.00    147.0±1.14µs        ? ?/sec
     array_has_strings/not_found/100                1.11      6.5±0.04ms        
? ?/sec    1.00      5.9±0.08ms        ? ?/sec
     array_has_strings/not_found/500                1.03     16.5±0.04ms        
? ?/sec    1.00     16.0±0.07ms        ? ?/sec
   ```
   
   Happily, this seems to address the regressions we saw on large arrays with 
the initial approach. Less happily, 256-row chunking performs slightly less 
well than full-batch row conversion on my M4 Max machine, although 
interestingly the regressions are only for the i64 benchmarks:
   
   ```
     array_has_all (general/i64):
   
     ┌───────────────────┬────────────────────────────────┐
     │     Benchmark     │ change (chunked vs full-batch) │
     ├───────────────────┼────────────────────────────────┤
     │ all_found/10      │ +9.6% slower                   │
     ├───────────────────┼────────────────────────────────┤
     │ not_all_found/10  │ +9.0% slower                   │
     ├───────────────────┼────────────────────────────────┤
     │ all_found/100     │ +9.2% slower                   │
     ├───────────────────┼────────────────────────────────┤
     │ not_all_found/100 │ +10.0% slower                  │
     ├───────────────────┼────────────────────────────────┤
     │ all_found/500     │ +5.9% slower                   │
     ├───────────────────┼────────────────────────────────┤
     │ not_all_found/500 │ +5.5% slower                   │
     └───────────────────┴────────────────────────────────┘
   
     array_has_any (general/i64):
   
     ┌────────────────┬────────────────────────────────┐
     │   Benchmark    │ change (chunked vs full-batch) │
     ├────────────────┼────────────────────────────────┤
     │ some_match/10  │ +4.4% slower                   │
     ├────────────────┼────────────────────────────────┤
     │ no_match/10    │ +3.4% slower                   │
     ├────────────────┼────────────────────────────────┤
     │ some_match/100 │ +4.4% slower                   │
     ├────────────────┼────────────────────────────────┤
     │ no_match/100   │ +4.0% slower                   │
     ├────────────────┼────────────────────────────────┤
     │ some_match/500 │ +2.8% slower                   │
     ├────────────────┼────────────────────────────────┤
     │ no_match/500   │ +2.4% slower                   │
     └────────────────┴────────────────────────────────┘
   ```
   
   The string benchmarks were much closer and basically in the noise.
   
   Avoiding the regressions on large arrays seems worth the small performance 
hit on M4 machines, but it's probably worth exploring a bigger chunk size and 
seeing if that helps at all.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] perf: Use batched row conversion for `array_has_any`, `array_has_all` [datafusion]

Reply via email to