sjhddh opened a new pull request, #10060:
URL: https://github.com/apache/arrow-rs/pull/10060

   # Which issue does this PR close?
   
   - Closes #10034.
   
   # Rationale for this change
   
   `from_ffi` / `from_ffi_and_data_type` realign under-aligned but 
protocol-legal
   C Data Interface buffers (e.g. 8-byte-aligned `Decimal128` from a JVM 
producer)
   by calling `align_buffers()` *after* `ImportedArrowArray::consume()`. Under
   `cfg(feature = "force_validate")` that realignment is unreachable: `consume`
   builds `ArrayData` via `new_unchecked` -> `ArrayDataBuilder::build`, which 
runs
   `validate_data()` whenever `force_validate` is set, rejecting the misaligned
   buffer before the outer realignment can run. The correct order on an import
   boundary is "realign, then validate".
   
   # What changes are included in this PR?
   
   - `ImportedArrowArray::consume` now builds via `ArrayDataBuilder` with
     `.align_buffers(true).skip_validation(true)`, so buffers are realigned 
before
     validation. This works under `force_validate` (which still validates after
     realignment) and preserves the FFI hot path's skip of redundant validation
     otherwise. Applied recursively through `consume_child`.
   - Dropped the now-redundant outer `align_buffers()` calls in `from_ffi` /
     `from_ffi_and_data_type`.
   - Added a regression test in the `arrow` umbrella crate
     (`arrow/tests/ffi_from_ffi.rs`). It belongs there, not in `arrow-array`,
     because `arrow-array`'s `force_validate` feature is empty and does not 
forward
     to `arrow-data/force_validate`; an inline `arrow-array` test never reaches 
the
     `arrow-data` validation gate in CI and would pass even with the fix 
reverted.
     The `arrow` crate's `force_validate` forwards to 
`arrow-data/force_validate`,
     and CI (arrow.yml) runs `cargo test -p arrow 
--features=force_validate,...,ffi`,
     so the new test genuinely exercises the realign-before-validate path. The 
test
     exports an 8-byte-aligned `UInt8` buffer and re-imports it as `Decimal128`,
     then asserts the imported buffer is 16-aligned.
   - Documented that `ArrayData::align_buffers` must be called before 
validation on
     import boundaries.
   
   # Are these changes tested?
   
   Yes. The new regression test `arrow/tests/ffi_from_ffi.rs` fails on `main` 
when
   run under the CI invocation
   `cargo test -p arrow 
--features=force_validate,prettyprint,ipc_compression,ffi,chrono-tz`
   with `InvalidArgumentError("Misaligned buffers[0] in array of type
   Decimal128(10, 2) ... 16 by 8")`, and passes with this fix. (Note: an inline
   `arrow-array` test cannot serve as this guard because 
`arrow-array/force_validate`
   does not forward to `arrow-data/force_validate`; the gate lives in 
`arrow-data`.)
   Verified locally: `cargo test -p arrow-array --features ffi`,
   `cargo test -p arrow-data --all-features`, and the `arrow` force_validate
   invocation above all pass; `cargo fmt --all --check` and
   `cargo clippy -p arrow-array -p arrow-data --all-features --all-targets -D 
warnings`
   are clean.
   
   # Are there any user-facing changes?
   
   No public API changes. Behavior: `from_ffi` / `from_ffi_and_data_type` now 
accept
   spec-conformant under-aligned C Data Interface input under `force_validate`
   builds too (previously rejected). Already-aligned producers are unaffected.
   
   ---
   AI assistance was used to draft this change; I reviewed and verified every 
line,
   including reproducing the failure on a pristine `main` checkout and 
confirming
   the fix and all CI gates locally.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to