kosiew opened a new pull request, #20814:
URL: https://github.com/apache/datafusion/pull/20814
## Which issue does this PR close?
* Part of #20164
## Rationale for this change
Physical `CastExpr` previously stored only a target `DataType`. This caused
field-level semantics (name, nullability, and metadata) to be lost when casts
were represented in the physical layer. In contrast, logical expressions
already carry this information through `FieldRef`.
This mismatch created several issues:
* Physical and logical cast representations diverged in how they preserve
schema semantics.
* Struct casting logic behaved differently depending on whether the cast was
represented as `CastExpr` or `CastColumnExpr`.
* Downstream components (such as schema rewriting and ordering equivalence
analysis) required additional branching and duplicated logic.
Making `CastExpr` field-aware aligns the physical representation with
logical semantics and enables consistent schema propagation across execution
planning and expression evaluation.
## What changes are included in this PR?
This PR introduces field-aware semantics to `CastExpr` and simplifies
several areas that previously relied on type-only casting.
Key changes include:
1. **Field-aware CastExpr**
* Replace the `cast_type: DataType` field with `target_field: FieldRef`.
* Add `new_with_target_field` constructor to explicitly construct
field-aware casts.
* Keep the existing `new(expr, DataType)` constructor as a compatibility
shim that creates a canonical field.
2. **Return-field and nullability behavior**
* `return_field` now returns the full `target_field`, preserving name,
nullability, and metadata.
* `nullable()` now derives its result from the resolved target field
rather than the input expression.
* Add compatibility logic for legacy type-only casts to preserve previous
behavior.
3. **Struct cast validation improvements**
* Struct-to-struct casting now validates compatibility using field
information before execution.
* Planning-time validation prevents unsupported casts from reaching
execution.
4. **Shared cast property logic**
* Introduce shared helper functions (`cast_expr_properties`,
`is_order_preserving_cast_family`) for determining ordering preservation.
* Reuse this logic in both `CastExpr` and `CastColumnExpr` to avoid
duplicated implementations.
5. **Schema rewriter improvements**
* Refactor physical column resolution into `resolve_physical_column`.
* Simplify cast insertion logic when logical and physical fields differ.
* Pass explicit physical and logical fields to cast creation for improved
correctness.
6. **Ordering equivalence simplification**
* Introduce `substitute_cast_like_ordering` helper to unify handling of
`CastExpr` and `CastColumnExpr` in ordering equivalence analysis.
7. **Additional unit tests**
* Validate metadata propagation through `return_field`.
* Verify nullability behavior for field-aware casts.
* Ensure legacy type-only casts preserve existing semantics.
* Test struct-cast validation with nested field semantics.
## Are these changes tested?
Yes.
New unit tests were added in `physical-expr/src/expressions/cast.rs` to
verify:
* Metadata propagation through field-aware casts
* Correct nullability behavior derived from the target field
* Backward compatibility with legacy type-only constructors
* Struct cast compatibility validation using nested fields
Existing tests continue to pass and validate compatibility with the previous
API behavior.
## Are there any user-facing changes?
There are no direct user-facing behavior changes.
This change primarily improves internal schema semantics and consistency in
the physical expression layer. Existing APIs remain compatible through the
legacy constructor that accepts only a `DataType`.
## LLM-generated code disclosure
This PR includes LLM-generated code and comments. All LLM-generated content
has been manually reviewed and tested.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]