brunal opened a new pull request, #46408: URL: https://github.com/apache/arrow/pull/46408
### Rationale for this change Arrow C++ slices arrays by bumping the top-level `offset` value. However, Arrow Rust slices list arrays by slicing the `value_offsets` buffer. When receiving a Rust Arrow Array in C++ (via the C data interface), its IPC serialization fails to notice that the `value_offsets` buffer needed to be updated, but it still updates the `values` buffer. This leads to a corrupt array on deserialization, with an `value_offsets` buffer that points past the end of the values array. This PR fixes the IPC serialization by looking at value_offset(0) to determine whether the `value_offsets` buffer needs reconstructing, instead of just looking at offset(). This works because value_offset(int) is value_offset buffer + top-level offset (through ArrayData::GetValuesSafe(int)). Additionally, I update the comment surrounding the logic, as it had 2 issues: 1. It hints that offset > 0 and value_offsets[0] > 0 happen together, when they actually tend to be exclusive (... unless you slice twice, once in Rust and once in C++). 2. It mentions slicing the values, when that does not happen in the function where the comment appears (GetZeroBasedValueOffsets), but at call site (Visit(Array)). ### What changes are included in this PR? The fix, as well as a small cleanup of the function: * Extend the loop to cover i == array.length() instead of a dedicated statement. * Avoid copying the `offsets_buffer` shared_ptr (avoid refcount ++ then --). Note: I am not sure that the logic covered by the [ARROW-6046](https://issues.apache.org/jira/browse/ARROW-6046) tag does anything. I don't think it should trigger lest the offsets buffer is invalid, and removing it does not cause any test to fail. ### Are these changes tested? Yep ### Are there any user-facing changes? Nope (well, unless they are affected by the bug) **This PR contains a "Critical Fix".** (the changes fix (a) a security vulnerability, (b) a bug that caused incorrect or invalid data to be produced) : valid operations on valid data produce invalid data. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org