kosiew opened a new pull request, #20202:
URL: https://github.com/apache/datafusion/pull/20202
## Which issue does this PR close?
* Closes #20162.
## Rationale for this change
DataFusion’s physical expression adapter needs a reliable, schema-aware way
to cast columns—especially nested `Struct` columns—while honouring field-level
nullability metadata.
Today, casting pathways often depend on Arrow’s `CastOptions<'static>` /
`FormatOptions<'static>`, which are awkward for long-lived expressions (and
effectively require static string lifetimes). This makes it hard to propagate
dynamic formatting options (e.g. from SQL, IPC, or protobuf) without leaking or
interning strings.
This PR introduces:
* A dedicated `CastColumnExpr` physical expression for struct-aware casting
with explicit input/target fields.
* Owned cast/format options (`OwnedCastOptions`, `OwnedFormatOptions`) so
format strings can be carried safely across planning/serialization without
requiring `'static` lifetimes.
Together, these changes improve correctness (nullability validation),
reliability (schema-accurate casting), and extensibility (future cast
formatting support).
## What changes are included in this PR?
* **Owned formatting + cast options**
* Added `OwnedFormatOptions` (owned `String`-based variant of Arrow’s
`FormatOptions`).
* Added `OwnedCastOptions` (pairs `safe` + `OwnedFormatOptions`) with
conversion helpers to Arrow `CastOptions<'a>`.
* Re-exported these types from `datafusion_common`.
* **CastOptions lifetime improvements**
* Updated scalar/columnar casting APIs to accept `CastOptions<'_>` instead
of requiring `CastOptions<'static>`.
* Updated `ColumnarValue::cast_to` to avoid cloning options unnecessarily
and to cleanly fall back to defaults.
* **Schema-aware, struct-aware CastColumnExpr**
* Reworked `CastColumnExpr` to:
* Store `OwnedCastOptions` and an `input_schema` for proper column
resolution.
* Add `new_with_schema(...)` constructor for cases where expression
resolution depends on a broader schema.
* Validate cast compatibility up-front (including index bounds checks
and nullability constraints).
* Use `validate_struct_compatibility` and newly-exported
`validate_field_compatibility` for consistent checks across scalar and nested
contexts.
* **PhysicalExprAdapter improvements**
* Adapter now uses `validate_field_compatibility` to validate non-struct
casts, producing clearer errors.
* Uses `CastColumnExpr::new_with_schema(...)` to ensure the constructed
cast expression is schema-accurate when columns are rewritten/reindexed.
* **Nested struct casting consistency**
* Exposed `validate_field_compatibility` as `pub` and aligned struct
validation behavior.
* Minor cleanup to field matching wording/comments and some test
expectations.
* **Proto updates (physical expr + options)**
* Added protobuf support for `PhysicalCastColumnNode` and
`PhysicalCastOptions`.
* Added protobuf `FormatOptions` + `DurationFormat` to represent owned
formatting options.
* Kept backward compatibility fields (`safe`, `format_options`) in
`PhysicalCastColumnNode` with a deprecation note and precedence rule.
* Removed deprecated/unused protobuf messages/fields (e.g.
`BufferExecNode`, `FileOutputMode` / `file_output_mode`, expr_id).
* **Tests and fixture adjustments**
* Added/updated unit tests for:
* nullable → non-nullable cast rejection
* schema mismatch errors
* struct casting behavior with missing children
* Updated a number of parquet-related tests/schemas to mark columns as
nullable where appropriate to avoid invalid nullable→non-nullable casts after
stricter validation.
## Are these changes tested?
Yes.
* Added new unit tests in `cast_column.rs` covering:
* schema mismatch (type incompatibility)
* nullability enforcement (nullable → non-nullable rejection)
* Updated existing tests in:
* `nested_struct.rs`
* parquet filter / adapter tests
* physical expr adapter tests
These tests validate both the new expression behavior and the updated
validation rules.
## Are there any user-facing changes?
Potentially yes:
* **Stricter nullability enforcement during casting**: casts that would
silently allow nullable → non-nullable coercions may now be rejected earlier
with clearer errors.
* **Improved struct casting behavior**: struct fields are matched by name
and validated consistently; missing fields in the target are filled with nulls
(when allowed), extra fields are ignored.
* **Better error messages** for incompatible casts during physical plan
adaptation.
## LLM-generated code disclosure
This PR includes LLM-generated code and comments. All LLM-generated content
has been manually reviewed and tested.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]