adriangb opened a new pull request, #9510: URL: https://github.com/apache/arrow-rs/pull/9510
## Which issue does this PR close? Follow-up optimization to #7748 which added `has_true()`/`has_false()` to `BooleanArray`. ## Rationale The initial implementation used `BitChunks::iter_padded()` which yields one `u64` at a time through an opaque chained iterator that prevents LLVM auto-vectorization. Benchmarks showed the worst case (full scan, 65K elements) at ~255ns vs ~100ns for `true_count()` which uses `UnalignedBitChunk` — an aligned `&[u64]` slice that LLVM vectorizes. ## What changes are included in this PR? Switch the **no-nulls** paths of `has_true()` and `has_false()` from `BitChunks::iter_padded()` to `UnalignedBitChunk`. Process the aligned `chunks()` slice in blocks of 64 u64s (4096 bits), using a fold within each block that LLVM can auto-vectorize, and short-circuiting between blocks. - `has_true()`: OR-fold (`acc | c`) per block, short-circuit when `!= 0` - `has_false()`: AND-fold (`acc & c`) per block, short-circuit when `!= u64::MAX` The **with-nulls** paths are unchanged — the two buffers may have different alignment making a shared `&[u64]` zip difficult. ## Benchmark results (65536 elements) | Benchmark | Before | After | Speedup | |---|---|---|---| | `has_true(all_false)` — worst case full scan | ~255ns | **49ns** | **5.2x** | | `has_false(all_true)` — worst case full scan | ~255ns | **50ns** | **5.1x** | | `has_true(all_true)` — short-circuit | ~6ns | 6ns | — | | `has_false(all_false)` — short-circuit | ~6ns | 6ns | — | | `true_count` (for reference) | ~101ns | 101ns | — | The worst-case full scan is now ~2x faster than `true_count()` because the fold uses a simpler per-element operation (OR/AND vs popcount+sum). ## Potential DataFusion callsites These methods are new and not yet adopted in DataFusion, but several existing patterns could benefit: - `datafusion/functions-nested/src/array_has.rs:345` — `eq_array.true_count() > 0` → `eq_array.has_true()` - `datafusion/datasource-parquet/src/metadata.rs:615` — `combined_mask.true_count() > 0` → `combined_mask.has_true()` - `datafusion/physical-plan/src/joins/nested_loop_join.rs:1663,2250` — `bitmap.true_count() == 0` → `!bitmap.has_true()` - `datafusion/functions-nested/src/replace.rs:372` — `eq_array.false_count() == eq_array.len()` → `!eq_array.has_true()` - `datafusion/datasource-parquet/src/metadata.rs:495,514` — `exactness.true_count() == exactness.len()` → `!exactness.has_false()` - `datafusion/datasource-parquet/src/metadata.rs:498,517` — `exactness.true_count() == 0` → `!exactness.has_true()` ## Are these changes tested? Yes — all existing tests pass (`cargo test -p arrow-array -- boolean_array`), including non-aligned and edge case tests added in #7748. 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
