alamb opened a new issue, #10098:
URL: https://github.com/apache/arrow-rs/issues/10098

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
[`arrow-select::filter::filter_null_mask`](https://github.com/apache/arrow-rs/blob/main/arrow-select/src/filter.rs#L578-L588)
 filters validity bitmaps through 
[`filter_bits`](https://github.com/apache/arrow-rs/blob/main/arrow-select/src/filter.rs#L600-L640),
 which currently gathers selected bits via index iteration or copies contiguous 
slices. For dense or irregular predicates, this can do more per-bit work than 
necessary when compacting a source bitmap by a predicate bitmap.
   
   **Describe the solution you'd like**
   Add or reuse a `compress(value: u64, mask: u64) -> u64` bit utility, 
equivalent to Intel BMI2 
[`_pext_u64`](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_pext_u64),
 in `arrow-buffer` and use it in 
[`arrow-select::filter_bits`](https://github.com/apache/arrow-rs/blob/main/arrow-select/src/filter.rs#L600-L640).
 The implementation could process 64-bit chunks of `(source_bits, 
predicate_bits)`, append `compress(source_bits, predicate_bits)` with 
`predicate_bits.count_ones()` bits, and fall back to existing handling for 
offsets/remainders.
   
   **Describe alternatives you've considered**
   The existing index and slice strategies are general and correct, and slice 
copying remains good for long contiguous true runs. Another option is to keep 
the helper local to parquet, but 
[`filter_null_mask`](https://github.com/apache/arrow-rs/blob/main/arrow-select/src/filter.rs#L578-L588)
 lives in `arrow-select`, so sharing it from `arrow-buffer` seems more reusable.
   
   **Additional context**
   PR [#9848](https://github.com/apache/arrow-rs/pull/9848) adds a 
parquet-local `compress` helper for compacting validity bits while decoding 
definition levels. The same primitive appears applicable to filtering 
[`BooleanBuffer`](https://github.com/apache/arrow-rs/blob/main/arrow-buffer/src/buffer/boolean.rs)
 values and null masks in 
[`arrow-select::filter_bits`](https://github.com/apache/arrow-rs/blob/main/arrow-select/src/filter.rs#L600-L640),
 especially for boolean arrays and filtered validity buffers.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to