schenksj opened a new pull request, #22658: URL: https://github.com/apache/datafusion/pull/22658
## Which issue does this PR close? - Closes #22366. ## Rationale for this change `make_array` panics with `Arrays with inconsistent types passed to MutableArrayData` when combining values whose types are identical except for the nullability of a nested field — for example a struct field that is non-nullable when constructed from a literal but nullable when read from a column. This blocks native execution of plans that construct structs from multiple sources (Delta Lake CDC writes, `UNION` within arrays), forcing workarounds that sacrifice performance. Spark, Postgres, and Arrow's own `concat` all handle this by widening nullable flags rather than enforcing strict type equality. This PR brings `make_array` in line with that precedent: inputs that differ only in nested-field nullability are accepted, and the result widens nullable flags to `true` at every nesting level. ## What changes are included in this PR? - Add `merge_nullability`, which OR-s nullable flags at every nesting level (struct fields, list elements, ...) using Arrow's `Field::try_merge`, and returns `None` (preserving prior behavior) for structurally-incompatible inputs. - `array_array` (the runtime shared by both `make_array` and Spark's `array`) computes a merged element type that is a supertype of all arguments and cheaply casts each argument up to it before building the list, so `MutableArrayData` no longer sees inconsistent types. - `coerce_types_inner` widens the per-argument struct types produced by `try_type_union_resolution_with_struct` to a single common type, so the declared return type matches the value produced at runtime. ## Are these changes tested? Yes: - A new unit test (`make_array_relaxes_nested_field_nullability`) reproduces the original panic at the `make_array_inner` boundary and asserts it now succeeds. - New sqllogictest coverage in `array/make_array.slt` for `make_array` over flat and nested structs. Note: the SQL planner already normalizes nested-field nullability for struct construction from SQL literals/columns, so the panic is reached from sources with declared non-null nested schemas (e.g. Delta Lake CDC); the unit test exercises that path directly. ## Are there any user-facing changes? `make_array` (and Spark `array`) now succeed on inputs that previously panicked. There are no breaking API changes; the result type simply widens nested nullable flags where inputs disagree. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
