alamb opened a new pull request #942: URL: https://github.com/apache/arrow-rs/pull/942
# Which issue does this PR close? Resolves https://github.com/apache/arrow-rs/issues/940 # Rationale for this change It appears the arrow-rs slice logic for structs introduced in https://github.com/apache/arrow-rs/pull/389 by @nevi-me effectively slices the child data when a struct array is sliced. The validation code from validate.cc (and perhaps the C++ code) assumes that the children are not sliced, so when I ported that logic over it is not correct for sliced structs. You can see by the comment I was somewhat confused about the need for offset even when it was originally introduced # What changes are included in this PR? 1. Update the struct array validation logic to follow the Rust semantics where offsets are applied to both parent and child 2. Test case from @bjchambers in https://github.com/bjchambers/arrow-rs/blob/reproduce-validation-error/arrow/src/array/data.rs#L1648 # In pictures In pictures, here is what the testcase looks like: ``` 0 ┌─────┐ 0 ┌─────┐ ┌─────┐ 0 ┌─────┐ ┌─────┐ │ │ │ │ │ │ │ │ │ │ ├─────┤ ├─────┤ ├─────┤ ├─────┤ ├─────┤ │ │ │ │ │ │ │ │ │ │ ├─────┤ ├─────┤ ├─────┤ ├─────┤ ├─────┤ │ │ │ │ │ │ │ │ │ │ ├─────┤ ├─────┤ ├─────┤ ├─────┤ ├─────┤ │ │ │ │ │ │ │ │ │ │ ├─────┤ ├─────┤ ├─────┤ ├─────┤ ├─────┤ 5 │ │ 5 │ │ │ │ 5 │ │ │ │ └─────┘ └─────┘ └─────┘ └─────┘ └─────┘ Valid Valid i32[] Valid bool[] StructArray Child Array #1 Child Array #2 (Int32Array) (BooleanArray) ``` In rust, when we do `slice(1,3)` the offset is applied to both the ArrayData and its children: ``` 0 ┌─────┐ 0 ┌─────┐ ┌─────┐ 0 ┌─────┐ ┌─────┐ │ │ │ │ │ │ │ │ │ │ ─ ─ ─ ─ ─├─────┤─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─├─────┤─ ┼─────┼ ─ ─ ─ ─ ─├─────┤─ ┼─────┼ ─ ─ ─ ─ ─ ─ ─ ─ │ │ │ │ │ │ │ │ │ │ ├─────┤ ├─────┤ ├─────┤ ├─────┤ ├─────┤ │ │ │ │ │ │ │ │ │ │ ├─────┤ ├─────┤ ├─────┤ ├─────┤ ├─────┤ │ │ │ │ │ │ │ │ │ │ ─ ─ ─ ─ ─├─────┤─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─├─────┤─ ┼─────┼ ─ ─ ─ ─ ─├─────┤─ ┼─────┼ ─ ─ ─ ─ ─ ─ ─ ─ 5 │ │ 5 │ │ │ │ 5 │ │ │ │ └─────┘ └─────┘ └─────┘ └─────┘ └─────┘ Valid Valid i32[] Valid bool[] StructArray Child Array #1 Child Array #2 (Int32Array) (BooleanArray) ``` However, in the C++ validation logic, the assumption is that the children have no offsets (and the offset of the parent is applied): ``` 0 ┌─────┐ 0 ┌─────┐ ┌─────┐ 0 ┌─────┐ ┌─────┐ │ │ │ │ │ │ │ │ │ │ ─ ─ ─ ─ ─├─────┤─ ─ ─ ─ ─ ─ ├─────┤ ├─────┤ ├─────┤ ├─────┤ │ │ │ │ │ │ │ │ │ │ ├─────┤ ├─────┤ ├─────┤ ├─────┤ ├─────┤ │ │ │ │ │ │ │ │ │ │ ├─────┤ ├─────┤ ├─────┤ ├─────┤ ├─────┤ │ │ │ │ │ │ │ │ │ │ ─ ─ ─ ─ ─├─────┤─ ─ ─ ─ ─ ─ ├─────┤ ├─────┤ ├─────┤ ├─────┤ 5 │ │ 5 │ │ │ │ 5 │ │ │ │ └─────┘ └─────┘ └─────┘ └─────┘ └─────┘ Valid Valid i32[] Valid bool[] StructArray Child Array #1 Child Array #2 (Int32Array) (BooleanArray) ``` # Are there any user-facing changes? Sliced `StructArrays` can be created without validation errors (or using unsafe) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
