schenksj opened a new issue, #4528:
URL: https://github.com/apache/datafusion-comet/issues/4528

   ### Problem
   
   DataFusion's `make_array` asserts strict element-type equality (down to 
nested-field nullability) in `MutableArrayData::with_capacities`. Spark's 
`CreateArray` is more permissive: children can share a surface type (e.g. all 
`Struct<id, b, _change_type>`) yet differ only in nested-field nullability when 
the analyzer inserted no coercion cast. Native execution then **panics** inside 
`make_array_inner`.
   
   Reproducible with manually-built `array(struct(...), struct(...))` plans 
where one arm leaves a field non-nullable and another nullable.
   
   ### Proposed fix
   
   In `CometCreateArray`, when the children's `dataType`s are not all 
identical, decline serialization (`withInfo`) so Spark's JVM evaluator -- which 
has no such strictness -- handles it.
   
   This tracks upstream apache/datafusion#22366; the caller-side decline can be 
removed once that fix lands (it will widen the element type via 
nullability-OR-merge and cast each child before `MutableArrayData`).
   
   ### Relationship to the Delta integration
   
   Standalone guard against a native panic. It is **surfaced by** the 
in-progress Delta Lake contrib integration (Delta's CDC write path builds one 
struct per change type, leaving `_change_type` nullability divergent across 
arms), so it would help to prioritize it alongside that work. A PR will follow 
shortly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to