Brijesh-Thakkar opened a new pull request, #22640:
URL: https://github.com/apache/datafusion/pull/22640

   ## Which issue does this PR close?
   
   * Closes #22447
   
   ## Rationale for this change
   
   `LogicalPlan::recompute_schema()` contains special handling for 
`LogicalPlan::Union` to avoid unnecessarily rebuilding the schema when inputs 
have not changed.
   
   However, the current implementation only checks whether the cached schema 
and the first input schema have the same number of fields. This can leave the 
cached schema stale when optimizer rewrites modify field types, names, or 
qualifiers without changing the schema width.
   
   For example, after a type coercion rewrite, the input schemas may change 
from `Int32` to `Int64` while preserving the same column count. In this case, 
`recompute_schema()` incorrectly considers the schema unchanged and returns the 
stale cached schema.
   
   ## What changes are included in this PR?
   
   This PR updates the `LogicalPlan::Union` branch in `recompute_schema()` to 
compare schema structure rather than only schema width.
   
   The comparison now verifies:
   
   * Field count
   * Field data types
   * Field names
   * Field qualifiers
   
   If any of these differ from the current input schema, the `Union` schema is 
recomputed using `Union::try_new()`.
   
   Additionally, two regression tests were added:
   
   * `test_recompute_schema_union_type_mismatch`
   * `test_recompute_schema_union_name_mismatch`
   
   These tests verify that schema recomputation occurs when input field types 
or names change while the schema width remains unchanged.
   
   ## Are these changes tested?
   
   Yes.
   
   Added regression tests covering:
   
   1. Input type changes (`Int32` → `Int64`) with identical schema width.
   2. Input column name changes with identical schema width.
   
   The new tests fail with the previous width-only validation logic and pass 
with this change.
   
   The following test suites were also executed successfully:
   
   * `cargo test -p datafusion-expr`
   * `cargo test -p datafusion-optimizer`
   
   ## Are there any user-facing changes?
   
   No user-facing API changes.
   
   This change fixes internal schema propagation for `LogicalPlan::Union` after 
optimizer rewrites and ensures cached schemas remain consistent with rewritten 
inputs.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to