kosiew opened a new pull request, #20202:
URL: https://github.com/apache/datafusion/pull/20202

   
   ## Which issue does this PR close?
   
   * Closes #20162.
   
   ## Rationale for this change
   
   DataFusion’s physical expression adapter needs a reliable, schema-aware way 
to cast columns—especially nested `Struct` columns—while honouring field-level 
nullability metadata.
   
   Today, casting pathways often depend on Arrow’s `CastOptions<'static>` / 
`FormatOptions<'static>`, which are awkward for long-lived expressions (and 
effectively require static string lifetimes). This makes it hard to propagate 
dynamic formatting options (e.g. from SQL, IPC, or protobuf) without leaking or 
interning strings.
   
   This PR introduces:
   
   * A dedicated `CastColumnExpr` physical expression for struct-aware casting 
with explicit input/target fields.
   * Owned cast/format options (`OwnedCastOptions`, `OwnedFormatOptions`) so 
format strings can be carried safely across planning/serialization without 
requiring `'static` lifetimes.
   
   Together, these changes improve correctness (nullability validation), 
reliability (schema-accurate casting), and extensibility (future cast 
formatting support).
   
   ## What changes are included in this PR?
   
   * **Owned formatting + cast options**
   
     * Added `OwnedFormatOptions` (owned `String`-based variant of Arrow’s 
`FormatOptions`).
     * Added `OwnedCastOptions` (pairs `safe` + `OwnedFormatOptions`) with 
conversion helpers to Arrow `CastOptions<'a>`.
     * Re-exported these types from `datafusion_common`.
   
   * **CastOptions lifetime improvements**
   
     * Updated scalar/columnar casting APIs to accept `CastOptions<'_>` instead 
of requiring `CastOptions<'static>`.
     * Updated `ColumnarValue::cast_to` to avoid cloning options unnecessarily 
and to cleanly fall back to defaults.
   
   * **Schema-aware, struct-aware CastColumnExpr**
   
     * Reworked `CastColumnExpr` to:
   
       * Store `OwnedCastOptions` and an `input_schema` for proper column 
resolution.
       * Add `new_with_schema(...)` constructor for cases where expression 
resolution depends on a broader schema.
       * Validate cast compatibility up-front (including index bounds checks 
and nullability constraints).
       * Use `validate_struct_compatibility` and newly-exported 
`validate_field_compatibility` for consistent checks across scalar and nested 
contexts.
   
   * **PhysicalExprAdapter improvements**
   
     * Adapter now uses `validate_field_compatibility` to validate non-struct 
casts, producing clearer errors.
     * Uses `CastColumnExpr::new_with_schema(...)` to ensure the constructed 
cast expression is schema-accurate when columns are rewritten/reindexed.
   
   * **Nested struct casting consistency**
   
     * Exposed `validate_field_compatibility` as `pub` and aligned struct 
validation behavior.
     * Minor cleanup to field matching wording/comments and some test 
expectations.
   
   * **Proto updates (physical expr + options)**
   
     * Added protobuf support for `PhysicalCastColumnNode` and 
`PhysicalCastOptions`.
     * Added protobuf `FormatOptions` + `DurationFormat` to represent owned 
formatting options.
     * Kept backward compatibility fields (`safe`, `format_options`) in 
`PhysicalCastColumnNode` with a deprecation note and precedence rule.
     * Removed deprecated/unused protobuf messages/fields (e.g. 
`BufferExecNode`, `FileOutputMode` / `file_output_mode`, expr_id).
   
   * **Tests and fixture adjustments**
   
     * Added/updated unit tests for:
   
       * nullable → non-nullable cast rejection
       * schema mismatch errors
       * struct casting behavior with missing children
     * Updated a number of parquet-related tests/schemas to mark columns as 
nullable where appropriate to avoid invalid nullable→non-nullable casts after 
stricter validation.
   
   ## Are these changes tested?
   
   Yes.
   
   * Added new unit tests in `cast_column.rs` covering:
   
     * schema mismatch (type incompatibility)
     * nullability enforcement (nullable → non-nullable rejection)
   * Updated existing tests in:
   
     * `nested_struct.rs`
     * parquet filter / adapter tests
     * physical expr adapter tests
   
   These tests validate both the new expression behavior and the updated 
validation rules.
   
   ## Are there any user-facing changes?
   
   Potentially yes:
   
   * **Stricter nullability enforcement during casting**: casts that would 
silently allow nullable → non-nullable coercions may now be rejected earlier 
with clearer errors.
   * **Improved struct casting behavior**: struct fields are matched by name 
and validated consistently; missing fields in the target are filled with nulls 
(when allowed), extra fields are ignored.
   * **Better error messages** for incompatible casts during physical plan 
adaptation.
   
   
   ## LLM-generated code disclosure
   
   This PR includes LLM-generated code and comments. All LLM-generated content 
has been manually reviewed and tested.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to