joe-ucp opened a new pull request, #8877: URL: https://github.com/apache/arrow-rs/pull/8877
## What issue does this PR close? - **Part of**: [#8806](https://github.com/apache/arrow/issues/8806) — Bitwise API consolidation. This PR extracts the `nullif` bitmap/layout improvements from the larger bitwise PR for independent review. --- ## Rationale This PR addresses three key goals: 1. **Align `nullif` logic:** Refactor `nullif` to use the same core buffer-level bitwise primitives employed elsewhere. 2. **Document bitmap layout:** Encode the `ArrayData` bitmap layout contract in documentation and implementation, clarifying: - Bit numbering and endianness - Mapping of offsets and lengths to validity bits - Explicit meaning of NULLIF validity (`V(i) & !C(i)`) 3. **Robust handling of offsets/nulls:** Fix and guard against bugs involving offsets and null counts, particularly for sliced arrays and nested types. --- ## Summary of Changes - **Documentation:** - Add `docs/arraydata_bitmap_layout_contract.md` describing ArrayData bitmap invariants and nullif validity. - **Buffer helpers:** - Extend `arrow-buffer/src/buffer/immutable.rs` with `Buffer::bitwise_binary` and `Buffer::bitwise_unary`. - Add focused tests verifying these helpers against legacy bitmap ops, including edge cases (offsets/tail bits). - **nullif kernel refactor:** - Update `arrow-select/src/nullif.rs` to: - Compute NULLIF validity via a dedicated function using the new buffer helpers. - Always create result `ArrayData` with `offset = 0`, with buffers aligned to logical index 0. - Correctly handle offsets and null counts for sliced arrays, booleans, strings, and structs. - Keep changes localized to `nullif`; no changes made to other kernels. --- ## Testing - `cargo test -p arrow-select --lib nullif -- --nocapture` - `cargo test -p arrow-select --lib` - `cargo test -p arrow-buffer --lib` - `cargo test -p arrow-arith --lib` All tests pass. --- ## User-Facing Changes - **No public API-breaking changes.** - Normal `nullif` semantics are unchanged, but handling of edge cases with offsets/null bitmaps is now more robust and formally documented. --- -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
