schenksj opened a new issue, #22366: URL: https://github.com/apache/datafusion/issues/22366
## Summary `make_array` (in `datafusion-functions-nested`) panics when called with arrays whose element types share the same shape but differ in nested-field nullability. Spark, Postgres, and `arrow::compute::concat` all accept this and widen `nullable` to `true` in the result type. DataFusion's `make_array_inner` is stricter, which propagates up to any caller that builds `array(...)` over heterogeneously-produced child expressions. ## Repro symptom Real-world surfacing in [apache/datafusion-comet](https://github.com/apache/datafusion-comet) on a Delta Lake CDF write that builds `array(struct(id, b, _change_type=lit(\"delete\")), struct(id, b, _change_type=col(...)))` — one arm's `_change_type` is `Utf8` non-nullable (from a literal), another is `Utf8` nullable: ``` panicked at arrow-data-58.2.0/src/transform/mod.rs:422: assertion `left == right` failed: Arrays with inconsistent types passed to MutableArrayData left: Struct([Field { name: \"id\", data_type: Int64, nullable: true }, Field { name: \"b\", data_type: Int32 }, Field { name: \"_change_type\", data_type: Utf8 }]) right: Struct([Field { name: \"id\", data_type: Int64, nullable: true }, Field { name: \"b\", data_type: Int32 }, Field { name: \"_change_type\", data_type: Utf8, nullable: true }]) ``` Stack: `make_array_inner` → `MutableArrayData::with_capacities`. ## Proposal `make_array` should accept element types that are equal under nullability-widening (recursively, for nested structs/lists/maps). Concretely: - Compute the merged element type by walking each child's `DataType` and OR-ing the `nullable` flag at every level (this is essentially `Field::try_merge` minus the type-promotion arm). - Cast each child to the merged type before handing to `MutableArrayData`. - Return `ArrayType` with `containsNull = true` if any merge raised a nullability flag. This matches what `coerce_types`-style coercion does elsewhere in the planner, but applied at execution time when input arrays still disagree (the planner can't always normalize, e.g. when the array is built from disjoint sources like Delta CDF struct literals). ## Why this matters It blocks native execution of any plan that produces struct elements from multiple sources (CDF writes, UNION ALL inside an `array()`, manually-constructed plans bypassing TypeCoercion). Workaround today: callers must insert explicit casts upstream, or fall back to a non-DataFusion evaluator — both of which lose perf. ## Related caller-side mitigation (for context) Comet just landed a serde-side decline in [4cb9b4dc](https://github.com/apache/datafusion-comet/commit/) that falls back to Spark's JVM evaluator when `CreateArray`'s children have different `DataType`s. That fix is conservative but loses native execution. Upstreaming the relaxation here would let downstream projects keep native execution and would help any other Arrow-based engine hitting the same shape. I can put up a PR if the approach lands well. 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
