joe-ucp opened a new pull request, #8869:
URL: https://github.com/apache/arrow-rs/pull/8869

   Here’s a version that fits their template and avoids mentioning Python:
   
   ---
   
   # Which issue does this PR close?
   
   * Closes #8806.
   
   # Rationale for this change
   
   The bitwise operations used across `arrow-buffer`, `arrow-arith`, and 
`arrow-select` were spread across several overlapping helpers, which made it 
difficult to:
   
   * Know which APIs were the “canonical” bitwise primitives, and
   * Reason about and fix bugs involving offsets and null bitmaps, especially 
in `nullif`.
   
   While working on bitwise kernels, we also found that `nullif` in 
`arrow-select` was fragile around sliced arrays, offsets, and null-count 
handling. This PR centralizes the bitwise logic and makes `nullif` explicitly 
respect the `ArrayData` bitmap layout invariants (length, offset, validity 
mapping).
   
   # What changes are included in this PR?
   
   High-level:
   
   * **Documented the bitmap layout contract**
   
     * Added a documentation file (`docs/arraydata_bitmap_layout_contract.md`) 
that spells out:
   
       * How `len` and `offset` relate to validity bits,
       * How new arrays should be constructed (`offset = 0`),
       * And the `nullif` validity rule: `result_valid(i) = V(i) & !C(i)`.
   
   * **Strengthened core bitwise testing in `arrow-buffer` / `arrow-arith`**
   
     * Added/extended tests that:
   
       * Compare new `Buffer`/`MutableBuffer` bitwise APIs against existing 
helpers across random data,
       * Exercise offsets and tail bits (including `(offset + len)` boundary 
cases),
       * Check that in-place (`MutableBuffer`) operations produce the same bits 
as allocating versions.
     * Added `rand` as a **dev-dependency** of `arrow-arith` to support 
reproducible randomized tests.
   
   * **Refactored `nullif` to follow the layout contract**
   
     * Introduced a helper:
   
       ```rust
       fn compute_nullif_validity(
           left_valid: &Buffer,
           left_offset: usize,
           cond_mask: &Buffer,
           cond_offset: usize,
           len: usize,
       ) -> Buffer
       ```
   
       which implements the core `V(i) & !C(i)` validity logic over buffers and 
bit offsets.
   
     * Updated `nullif` to:
   
       * Derive correct bit offsets from `ArrayData` for the left validity and 
condition mask,
       * Always build new result `ArrayData` with `offset = 0`,
       * Slice value/offset buffers so element 0 in the result corresponds to 
logical index 0 for:
   
         * Primitive arrays,
         * Boolean arrays,
         * Utf8 / LargeUtf8 arrays,
         * Struct arrays (with correct top-level validity propagation).
   
   * **Nullif-focused tests**
   
     * Added a focused unit test for `compute_nullif_validity` that exercises:
   
       * No-offset cases,
       * `(offset + len) % 8 == 0` boundaries,
       * Cross-byte bit ranges.
     * Verified and fixed behavior so that the existing `nullif` tests all 
pass, including:
   
       * Offset regressions for ints and booleans,
       * String and struct slice tests,
       * `null_count` regression tests,
       * The existing `nullif_fuzz` test.
   
   Net effect: `nullif` is now a thin consumer of the documented bitmap layout 
contract and the consolidated bitwise APIs.
   
   # Are these changes tested?
   
   Yes.
   
   The following were run locally:
   
   ```bash
   # Core buffer + arithmetic crates
   cargo test -p arrow-buffer --lib
   cargo test -p arrow-arith --lib
   
   # Select crate, including all nullif tests
   cargo test -p arrow-select --lib
   cargo test -p arrow-select --lib nullif -- --nocapture
   ```
   
   All of the above pass, including the previously failing `nullif` regression 
and fuzz tests.
   
   # Are there any user-facing changes?
   
   There are **no breaking changes** to public APIs.
   
   User-visible effects:
   
   * `nullif` behavior is now strictly aligned with the documented bitmap 
layout contract for:
   
     * Sliced arrays,
     * Offsets,
     * Null masks,
     * Strings and structs.
   * This should only be observable as **bug fixes** in corner cases (e.g., 
nulls at specific offsets, struct/string slices), not API changes.
   
   No new configuration or feature flags are introduced; this is primarily a 
correctness and maintainability improvement around existing behavior.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to