schenksj opened a new issue, #4528: URL: https://github.com/apache/datafusion-comet/issues/4528
### Problem DataFusion's `make_array` asserts strict element-type equality (down to nested-field nullability) in `MutableArrayData::with_capacities`. Spark's `CreateArray` is more permissive: children can share a surface type (e.g. all `Struct<id, b, _change_type>`) yet differ only in nested-field nullability when the analyzer inserted no coercion cast. Native execution then **panics** inside `make_array_inner`. Reproducible with manually-built `array(struct(...), struct(...))` plans where one arm leaves a field non-nullable and another nullable. ### Proposed fix In `CometCreateArray`, when the children's `dataType`s are not all identical, decline serialization (`withInfo`) so Spark's JVM evaluator -- which has no such strictness -- handles it. This tracks upstream apache/datafusion#22366; the caller-side decline can be removed once that fix lands (it will widen the element type via nullability-OR-merge and cast each child before `MutableArrayData`). ### Relationship to the Delta integration Standalone guard against a native panic. It is **surfaced by** the in-progress Delta Lake contrib integration (Delta's CDC write path builds one struct per change type, leaving `_change_type` nullability divergent across arms), so it would help to prioritize it alongside that work. A PR will follow shortly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
