sjhddh opened a new pull request, #10060:
URL: https://github.com/apache/arrow-rs/pull/10060
# Which issue does this PR close?
- Closes #10034.
# Rationale for this change
`from_ffi` / `from_ffi_and_data_type` realign under-aligned but
protocol-legal
C Data Interface buffers (e.g. 8-byte-aligned `Decimal128` from a JVM
producer)
by calling `align_buffers()` *after* `ImportedArrowArray::consume()`. Under
`cfg(feature = "force_validate")` that realignment is unreachable: `consume`
builds `ArrayData` via `new_unchecked` -> `ArrayDataBuilder::build`, which
runs
`validate_data()` whenever `force_validate` is set, rejecting the misaligned
buffer before the outer realignment can run. The correct order on an import
boundary is "realign, then validate".
# What changes are included in this PR?
- `ImportedArrowArray::consume` now builds via `ArrayDataBuilder` with
`.align_buffers(true).skip_validation(true)`, so buffers are realigned
before
validation. This works under `force_validate` (which still validates after
realignment) and preserves the FFI hot path's skip of redundant validation
otherwise. Applied recursively through `consume_child`.
- Dropped the now-redundant outer `align_buffers()` calls in `from_ffi` /
`from_ffi_and_data_type`.
- Added a regression test in the `arrow` umbrella crate
(`arrow/tests/ffi_from_ffi.rs`). It belongs there, not in `arrow-array`,
because `arrow-array`'s `force_validate` feature is empty and does not
forward
to `arrow-data/force_validate`; an inline `arrow-array` test never reaches
the
`arrow-data` validation gate in CI and would pass even with the fix
reverted.
The `arrow` crate's `force_validate` forwards to
`arrow-data/force_validate`,
and CI (arrow.yml) runs `cargo test -p arrow
--features=force_validate,...,ffi`,
so the new test genuinely exercises the realign-before-validate path. The
test
exports an 8-byte-aligned `UInt8` buffer and re-imports it as `Decimal128`,
then asserts the imported buffer is 16-aligned.
- Documented that `ArrayData::align_buffers` must be called before
validation on
import boundaries.
# Are these changes tested?
Yes. The new regression test `arrow/tests/ffi_from_ffi.rs` fails on `main`
when
run under the CI invocation
`cargo test -p arrow
--features=force_validate,prettyprint,ipc_compression,ffi,chrono-tz`
with `InvalidArgumentError("Misaligned buffers[0] in array of type
Decimal128(10, 2) ... 16 by 8")`, and passes with this fix. (Note: an inline
`arrow-array` test cannot serve as this guard because
`arrow-array/force_validate`
does not forward to `arrow-data/force_validate`; the gate lives in
`arrow-data`.)
Verified locally: `cargo test -p arrow-array --features ffi`,
`cargo test -p arrow-data --all-features`, and the `arrow` force_validate
invocation above all pass; `cargo fmt --all --check` and
`cargo clippy -p arrow-array -p arrow-data --all-features --all-targets -D
warnings`
are clean.
# Are there any user-facing changes?
No public API changes. Behavior: `from_ffi` / `from_ffi_and_data_type` now
accept
spec-conformant under-aligned C Data Interface input under `force_validate`
builds too (previously rejected). Already-aligned producers are unaffected.
---
AI assistance was used to draft this change; I reviewed and verified every
line,
including reproducing the failure on a pristine `main` checkout and
confirming
the fix and all CI gates locally.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]