adriangb opened a new pull request, #9511:
URL: https://github.com/apache/arrow-rs/pull/9511

   ## Motivation
   
   When working with `BooleanArray`, a common pattern is checking whether *any* 
true or false value exists — e.g.
   `arr.true_count() > 0` or `arr.false_count() == 0`. This currently requires 
`true_count()` / `false_count()`, which scan the **entire** bitmap to count 
every set bit (via `popcount`), even though we only need to know if at least 
one exists.
   
   This PR adds `has_true()` and `has_false()` methods that short-circuit as 
soon as they find a matching value, providing both:
   
   1. **Better performance** — faster on large arrays in the best case
   2. **More ergonomic API** — `arr.has_true()` expresses intent more clearly 
than `arr.true_count() > 0`
   
   ## Callsites in DataFusion
   
   There are several places in DataFusion that would benefit from these methods:
   
   - **`datafusion/functions-nested/src/array_has.rs`** — 
`eq_array.true_count() > 0` → `eq_array.has_true()`
   - **`datafusion/physical-plan/src/topk/mod.rs`** — `filter.true_count() == 
0` check → `!filter.has_true()`
   - **`datafusion/datasource-parquet/src/metadata.rs`** — 
`exactness.true_count() == 0` and `combined_mask.true_count() > 0`
   - **`datafusion/physical-plan/src/joins/nested_loop_join.rs`** — 
`bitmap.true_count() == 0` checks
   - **`datafusion/physical-expr-common/src/physical_expr.rs`** — 
`selection_count == 0` from `selection.true_count()`
   - **`datafusion/physical-expr/src/expressions/binary.rs`** — short-circuit 
checks for AND/OR
   
   ## Benchmark Results
   
   ```
   Scenario                          true_count     has_true       has_false    
  Speedup (best)
   
─────────────────────────────────────────────────────────────────────────────────────────────
   all_true, 64                      4.32 ns        4.08 ns        4.76 ns      
  ~1.1x
   all_false, 64                     4.30 ns        4.25 ns        4.52 ns      
  ~1.0x
   all_true, 1024                    5.15 ns        4.52 ns        4.99 ns      
  ~1.1x
   all_false, 1024                   5.17 ns        4.55 ns        5.00 ns      
  ~1.1x
   mixed_early, 1024                 5.22 ns        —              5.04 ns      
  ~1.0x
   nulls_all_true, 1024              12.84 ns       4.10 ns        12.92 ns     
  ~3.1x
   all_true, 65536                   100.06 ns      5.96 ns        49.70 ns     
  ~16.8x (has_true)
   all_false, 65536                  99.33 ns       49.30 ns       6.19 ns      
  ~16.0x (has_false)
   mixed_early, 65536                100.10 ns      —              6.20 ns      
  ~16.1x (has_false)
   nulls_all_true, 65536             522.94 ns      4.05 ns        521.82 ns    
  ~129x (has_true)
   ```
   
   The key wins are on larger arrays (65,536 elements), where 
`has_true`/`has_false` are **up to 16-129x faster** than
   `true_count()` in best-case scenarios (early short-circuit). Even in worst 
case (must scan entire array), performance is
   comparable to `true_count`.
   
   ## Implementation
   
   The implementation processes bits in 64-bit chunks using 
`UnalignedBitChunk`, which handles arbitrary bit offsets and aligns
   data for SIMD-friendly processing.
   
   - **`has_true` (no nulls):** OR-folds 64-bit chunks, short-circuits when any 
bit is set
   - **`has_false` (no nulls):** AND-folds 64-bit chunks, short-circuits when 
any bit is unset (with padding bits masked to 1)
   - **With nulls:** Iterates paired `(null, value)` chunks, checking `null & 
value != 0` (has_true) or `null & !value != 0`
   (has_false)
   
   ### Alternatives considered
   
   1. **Fully vectorized (no early stopping):** Would process the entire bitmap 
like `true_count()` but with simpler bitwise ops
     instead of popcount. Marginally faster than `true_count()` but misses the 
main optimization opportunity.
   2. **Per-element iteration with early stopping:** `self.iter().any(|v| v == 
Some(true))`. Simple but processes one bit at a
   time, missing SIMD vectorization of the inner loop. Our approach processes 
64 bits at a time while still supporting early
   exit.
   
   The chosen approach balances SIMD-friendly bulk processing (64 bits per 
iteration) with early termination, giving the best of
     both worlds.
   
   ## Test Plan
   
   - Unit tests covering: all-true, all-false, mixed, empty, nullable 
(all-valid-true, all-valid-false, all-null), non-aligned
   lengths (65 elements, 64+1 with trailing false)
   - Criterion benchmarks comparing `has_true`/`has_false` vs `true_count` 
across sizes (64, 1024, 65536) and data distributions
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to