alamb opened a new issue, #10098: URL: https://github.com/apache/arrow-rs/issues/10098
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** [`arrow-select::filter::filter_null_mask`](https://github.com/apache/arrow-rs/blob/main/arrow-select/src/filter.rs#L578-L588) filters validity bitmaps through [`filter_bits`](https://github.com/apache/arrow-rs/blob/main/arrow-select/src/filter.rs#L600-L640), which currently gathers selected bits via index iteration or copies contiguous slices. For dense or irregular predicates, this can do more per-bit work than necessary when compacting a source bitmap by a predicate bitmap. **Describe the solution you'd like** Add or reuse a `compress(value: u64, mask: u64) -> u64` bit utility, equivalent to Intel BMI2 [`_pext_u64`](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_pext_u64), in `arrow-buffer` and use it in [`arrow-select::filter_bits`](https://github.com/apache/arrow-rs/blob/main/arrow-select/src/filter.rs#L600-L640). The implementation could process 64-bit chunks of `(source_bits, predicate_bits)`, append `compress(source_bits, predicate_bits)` with `predicate_bits.count_ones()` bits, and fall back to existing handling for offsets/remainders. **Describe alternatives you've considered** The existing index and slice strategies are general and correct, and slice copying remains good for long contiguous true runs. Another option is to keep the helper local to parquet, but [`filter_null_mask`](https://github.com/apache/arrow-rs/blob/main/arrow-select/src/filter.rs#L578-L588) lives in `arrow-select`, so sharing it from `arrow-buffer` seems more reusable. **Additional context** PR [#9848](https://github.com/apache/arrow-rs/pull/9848) adds a parquet-local `compress` helper for compacting validity bits while decoding definition levels. The same primitive appears applicable to filtering [`BooleanBuffer`](https://github.com/apache/arrow-rs/blob/main/arrow-buffer/src/buffer/boolean.rs) values and null masks in [`arrow-select::filter_bits`](https://github.com/apache/arrow-rs/blob/main/arrow-select/src/filter.rs#L600-L640), especially for boolean arrays and filtered validity buffers. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
