kosiew opened a new pull request, #19674:
URL: https://github.com/apache/datafusion/pull/19674
## Which issue does this PR close?
* Closes #17285.
---
## Rationale for this change
Casting between `STRUCT` types in DataFusion previously relied on **physical
field order**. When a struct literal or expression had the same fields but in a
different order than the target schema, DataFusion could silently assign values
to the wrong fields (positional matching), causing incorrect results and
potential data corruption.
This PR changes struct-to-struct casting behavior to **match and reorder
fields by name**, ensuring that:
* `{b: 3, a: 4}::STRUCT(a INT, b INT)` yields `{a: 4, b: 3}`
* nested structs are handled consistently
* missing fields are filled with nulls and extra fields are ignored (where
supported by the runtime cast)
This addresses the bug described in #17285 (and aligns with the discussion
in #14396 / #17281).
---
## What changes are included in this PR?
* **Name-based struct casting at runtime**
* Added `cast_struct_array_by_name` in `datafusion_common::nested_struct`
as a small wrapper around existing struct casting logic.
* Updated `ScalarValue::cast_to_with_options` to use name-based struct
casting when both source and target are structs.
* Updated `ColumnarValue::cast_to` (array path) to route struct casts
through name-based matching/reordering and fall back to Arrow casting for
non-struct types.
* **Struct type coercion improvements for binary expressions**
* Updated `struct_coercion` to attempt **name-based field alignment**
first (when there is name overlap), and to **fallback to positional coercion**
when names don’t match (preserves backward compatibility for unnamed/positional
patterns).
* **Planner / physical cast permissiveness for struct-to-struct**
* Updated `ExprSchemable` cast checks to allow struct-to-struct casts
during planning even when Arrow’s `can_cast_types` would reject them, deferring
detailed matching to runtime.
* Updated physical `cast_with_options` to similarly allow struct-to-struct
casts (including field-count mismatches) so execution can apply name-based
casting.
* **Optimizer safety: avoid const-folding problematic struct casts**
* Added a guard in `simplify_expressions` to skip const-folding struct
casts when field counts mismatch, preventing potential optimizer hangs.
* **Tests and sqllogictest coverage**
* Added unit tests validating:
* field reordering matches by name
* missing target fields produce nulls
* Updated/extended SQL logic tests (`struct.slt`, `case.slt`) to cover:
* out-of-order struct literals now working
* arrays of structs with different field order
* nested struct reordering
* casts with missing/extra fields
* clarified CASE coercion expectations with name-based behavior
---
## Are these changes tested?
Yes.
* Added Rust unit tests in `datafusion/expr-common/src/columnar_value.rs`:
* `cast_struct_by_field_name`
* `cast_struct_missing_field_inserts_nulls`
* Updated and expanded sqllogictest files:
* `datafusion/sqllogictest/test_files/struct.slt`
* `datafusion/sqllogictest/test_files/case.slt`
These tests cover field reordering, missing/extra fields, nested structs,
and ensure behavior matches expectations across planner + execution.
---
## Are there any user-facing changes?
Yes.
* **Behavior change:** Struct-to-struct casts now match fields **by name**
rather than position. This prevents silent mis-assignment when schemas differ
only by field order.
* Struct literals and arrays of structs with fields in different orders that
previously errored (or produced incorrect results) may now succeed.
Potential considerations:
* Users relying on positional behavior when field names don’t align may
observe changes; coercion falls back to positional matching only when there is
no name overlap.
---
## LLM-generated code disclosure
This PR includes LLM-generated code and comments. All LLM-generated content
has been manually reviewed and tested.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]