schenksj opened a new pull request, #4533:
URL: https://github.com/apache/datafusion-comet/pull/4533

   ## Which issue does this PR close?
   
   Closes #4528.
   
   ## Rationale for this change
   
   DataFusion's `make_array` asserts strict element-type equality in 
`MutableArrayData::with_capacities` and panics on a mismatch. Spark's 
`CreateArray` coerces element types with `sameType`, which ignores nullability, 
so children that share a surface type but differ only in a nested struct 
field's nullability get no unifying cast. For example `array(struct(a not 
null), struct(a nullable))` reaches native execution with two different struct 
types and panics:
   
   ```
   native panic: assertion `left == right` failed: Arrays with inconsistent 
types passed to MutableArrayData
   ```
   
   This is a standalone fix; it was surfaced while working on the Delta Lake 
contrib integration (Delta's CDC write path builds `array(struct(...), 
struct(...))` plans with one struct per change type, leaving a `_change_type` 
field's nullability divergent across arms), so prioritizing it helps that 
effort, but it applies to any such plan.
   
   ## What changes are included in this PR?
   
   `CometCreateArray` now declines (falls back to Spark) when its children's 
types differ in a way `make_array` cannot handle. DataFusion **tolerates** 
container nullability differences (`ArrayType.containsNull` / 
`MapType.valueContainsNull` are coerced) but not a struct field's nullability, 
so the check normalizes container nullability before comparing and keeps struct 
field nullability significant — declining only the cases that actually panic. 
This avoids over-declining legitimate arrays of arrays/maps that differ only in 
`containsNull`.
   
   This tracks upstream apache/datafusion#22366; the caller-side decline can be 
removed once that fix lands.
   
   ## How are these changes tested?
   
   New test in `CometArrayExpressionSuite` builds `array(struct(id, ct not 
null), struct(id, ct nullable))` and asserts correct results. The test fails on 
`main` with the native `MutableArrayData` panic and passes with this change. 
The full `CometArrayExpressionSuite` (40/40) passes, including `arrays_overlap 
- nested array null handling` which exercises arrays differing only in 
`containsNull` and must still run natively.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to