kosiew opened a new issue, #10233: URL: https://github.com/apache/arrow-rs/issues/10233
## Problem Arrow nested arrays can contain child values that are semantically hidden by a null parent slot. During casts, those hidden child values may still be inspected and can cause the cast to fail, even though the corresponding parent value is null and the child values are not logically visible. [This was observed](https://github.com/apache/datafusion/pull/22980#discussion_r3486049771) while adding DataFusion support for recursive schema adaptation of `FixedSizeList<Struct>` values. `FixedSizeListArray` always stores `len * list_size` child slots, including slots for null parent lists. If those hidden child slots contain values that fail the child cast, the nested cast fails unless the caller first masks the hidden child positions to null. ## Why it matters For nested arrays, parent nulls should hide child contents from value-level cast failures. Otherwise valid arrays can fail casts because of unreachable child values. This can force downstream projects to add local masking workarounds around Arrow casts. ## Invariant / desired behavior When casting a nested array with nullable parent slots, child values under null parents should not cause value-level cast failures. The cast should preserve the parent nulls and only require visible child values to cast successfully. Example shape: ```text FixedSizeList<Struct<a: Utf8>> list_size = 2 parent validity: [null, valid] child a values: ["not_int", "also_bad", "1", "2"] cast target: FixedSizeList<Struct<a: Int32>> Expected: - parent[0] remains null - child slots 0 and 1 are ignored/masked because parent[0] is null - parent[1] casts from ["1", "2"] to [1, 2] Unexpected: - cast fails on "not_int" / "also_bad" even though parent[0] is null ``` ## Proposed direction Evaluate whether Arrow cast kernels for nested arrays should be parent-null-aware: - For `FixedSizeList`, expand the parent null bitmap to child positions before recursively casting child values, or otherwise ensure hidden child values cannot fail the cast. - Consider whether the same invariant should apply to `List`, `LargeList`, `ListView`, and `LargeListView` when null parent slots reference non-empty child ranges. - Preserve existing behavior for visible child values and for schema/type incompatibilities. ## Scope ### In - Reproduce hidden-child cast failure for `FixedSizeList` with null parent slots. - Decide expected Arrow semantics for value-level cast errors under null nested parents. - Add regression coverage for at least `FixedSizeList`. - If accepted, update nested cast implementation so hidden child values under null parents do not fail casts. ### Out - DataFusion-specific schema evolution rules. - Struct field-addition compatibility policy. - Planner/runtime parity checks in DataFusion. - Changing behavior for visible child values that fail to cast. ## Acceptance criteria - [ ] A `FixedSizeList` cast with invalid child values under null parent slots succeeds when all visible child values are castable. - [ ] Parent null bitmap is preserved in the cast result. - [ ] Visible invalid child values still fail or null according to the configured cast options. - [ ] Type/schema incompatibilities are still rejected. - [ ] Tests clarify whether the same behavior is expected for `List` / `ListView` families. ## Tests / verification Suggested regression test: - Build `FixedSizeList<Struct<a: Utf8>>` with `list_size = 2`. - Parent validity: first list null, second list valid. - Child `a` values: `"not_int"`, `"also_bad"`, `"1"`, `"2"`. - Cast to `FixedSizeList<Struct<a: Int32>>`. - Assert first parent remains null, second parent is valid, and visible values cast to `1`, `2`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
