schenksj opened a new pull request, #22658:
URL: https://github.com/apache/datafusion/pull/22658

   ## Which issue does this PR close?
   
   - Closes #22366.
   
   ## Rationale for this change
   
   `make_array` panics with `Arrays with inconsistent types passed to 
MutableArrayData` when combining values whose types are identical except for 
the nullability of a nested field — for example a struct field that is 
non-nullable when constructed from a literal but nullable when read from a 
column.
   
   This blocks native execution of plans that construct structs from multiple 
sources (Delta Lake CDC writes, `UNION` within arrays), forcing workarounds 
that sacrifice performance.
   
   Spark, Postgres, and Arrow's own `concat` all handle this by widening 
nullable flags rather than enforcing strict type equality. This PR brings 
`make_array` in line with that precedent: inputs that differ only in 
nested-field nullability are accepted, and the result widens nullable flags to 
`true` at every nesting level.
   
   ## What changes are included in this PR?
   
   - Add `merge_nullability`, which OR-s nullable flags at every nesting level 
(struct fields, list elements, ...) using Arrow's `Field::try_merge`, and 
returns `None` (preserving prior behavior) for structurally-incompatible inputs.
   - `array_array` (the runtime shared by both `make_array` and Spark's 
`array`) computes a merged element type that is a supertype of all arguments 
and cheaply casts each argument up to it before building the list, so 
`MutableArrayData` no longer sees inconsistent types.
   - `coerce_types_inner` widens the per-argument struct types produced by 
`try_type_union_resolution_with_struct` to a single common type, so the 
declared return type matches the value produced at runtime.
   
   ## Are these changes tested?
   
   Yes:
   - A new unit test (`make_array_relaxes_nested_field_nullability`) reproduces 
the original panic at the `make_array_inner` boundary and asserts it now 
succeeds.
   - New sqllogictest coverage in `array/make_array.slt` for `make_array` over 
flat and nested structs.
   
   Note: the SQL planner already normalizes nested-field nullability for struct 
construction from SQL literals/columns, so the panic is reached from sources 
with declared non-null nested schemas (e.g. Delta Lake CDC); the unit test 
exercises that path directly.
   
   ## Are there any user-facing changes?
   
   `make_array` (and Spark `array`) now succeed on inputs that previously 
panicked. There are no breaking API changes; the result type simply widens 
nested nullable flags where inputs disagree.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to