nathanb9 commented on code in PR #22640:
URL: https://github.com/apache/datafusion/pull/22640#discussion_r3329408879
##########
datafusion/expr/src/logical_plan/plan.rs:
##########
@@ -708,19 +708,28 @@ impl LogicalPlan {
}))
}
LogicalPlan::Union(Union { inputs, schema }) => {
- let first_input_schema = inputs[0].schema();
- if schema.fields().len() == first_input_schema.fields().len() {
- // If inputs are not pruned do not change schema
- Ok(LogicalPlan::Union(Union { inputs, schema }))
+ // Recompute what the schema should be from the current inputs.
+ // Comparing the full recomputed schema (not just inputs[0])
correctly
+ // handles: field-count changes, type changes, name/alias
changes,
+ // nullability changes, and metadata changes — all in one
place.
+ //
+ // A note on `Union`s constructed via `try_new_by_name`:
+ //
+ // At this point, the schema for each input should have
+ // the same width. Thus, we do not need to save whether a
+ // `Union` was created `BY NAME`, and can safely rely on the
+ // `try_new` initializer to derive the new schema based on
+ // column positions.
+ let recomputed = Union::try_new(inputs)?;
Review Comment:
Also, after looking at `Union::try_new` more closely, I see that it always
loses metadata.
Metadata may be useful since I see that it is preserved in
`coerce_union_schema_with_schema`
(https://github.com/apache/datafusion/blob/main/datafusion/optimizer/src/analyzer/type_coercion.rs)
```rust
/// **Schema-level metadata merging**: Later schemas take precedence for
duplicate keys.
///
/// **Field-level metadata merging**: Later fields take precedence for
duplicate metadata keys.
```
So we might need an API alternative to `try_new` or a change to it so that
we can preserve schema-level and field-level metadata in the case of type
coercion
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]