schenksj opened a new issue, #22366:
URL: https://github.com/apache/datafusion/issues/22366

   ## Summary
   
   `make_array` (in `datafusion-functions-nested`) panics when called with 
arrays whose element types share the same shape but differ in nested-field 
nullability. Spark, Postgres, and `arrow::compute::concat` all accept this and 
widen `nullable` to `true` in the result type. DataFusion's `make_array_inner` 
is stricter, which propagates up to any caller that builds `array(...)` over 
heterogeneously-produced child expressions.
   
   ## Repro symptom
   
   Real-world surfacing in 
[apache/datafusion-comet](https://github.com/apache/datafusion-comet) on a 
Delta Lake CDF write that builds `array(struct(id, b, 
_change_type=lit(\"delete\")), struct(id, b, _change_type=col(...)))` — one 
arm's `_change_type` is `Utf8` non-nullable (from a literal), another is `Utf8` 
nullable:
   
   ```
   panicked at arrow-data-58.2.0/src/transform/mod.rs:422:
   assertion `left == right` failed: Arrays with inconsistent types passed to 
MutableArrayData
    left: Struct([Field { name: \"id\", data_type: Int64, nullable: true },
                  Field { name: \"b\",  data_type: Int32 },
                  Field { name: \"_change_type\", data_type: Utf8 }])
   right: Struct([Field { name: \"id\", data_type: Int64, nullable: true },
                  Field { name: \"b\",  data_type: Int32 },
                  Field { name: \"_change_type\", data_type: Utf8, nullable: 
true }])
   ```
   
   Stack: `make_array_inner` → `MutableArrayData::with_capacities`.
   
   ## Proposal
   
   `make_array` should accept element types that are equal under 
nullability-widening (recursively, for nested structs/lists/maps). Concretely:
   
   - Compute the merged element type by walking each child's `DataType` and 
OR-ing the `nullable` flag at every level (this is essentially 
`Field::try_merge` minus the type-promotion arm).
   - Cast each child to the merged type before handing to `MutableArrayData`.
   - Return `ArrayType` with `containsNull = true` if any merge raised a 
nullability flag.
   
   This matches what `coerce_types`-style coercion does elsewhere in the 
planner, but applied at execution time when input arrays still disagree (the 
planner can't always normalize, e.g. when the array is built from disjoint 
sources like Delta CDF struct literals).
   
   ## Why this matters
   
   It blocks native execution of any plan that produces struct elements from 
multiple sources (CDF writes, UNION ALL inside an `array()`, 
manually-constructed plans bypassing TypeCoercion). Workaround today: callers 
must insert explicit casts upstream, or fall back to a non-DataFusion evaluator 
— both of which lose perf.
   
   ## Related caller-side mitigation (for context)
   
   Comet just landed a serde-side decline in 
[4cb9b4dc](https://github.com/apache/datafusion-comet/commit/) that falls back 
to Spark's JVM evaluator when `CreateArray`'s children have different 
`DataType`s. That fix is conservative but loses native execution. Upstreaming 
the relaxation here would let downstream projects keep native execution and 
would help any other Arrow-based engine hitting the same shape.
   
   I can put up a PR if the approach lands well.
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to