Copilot commented on code in PR #22640:
URL: https://github.com/apache/datafusion/pull/22640#discussion_r3328367883
##########
datafusion/expr/src/logical_plan/plan.rs:
##########
@@ -709,8 +709,20 @@ impl LogicalPlan {
}
LogicalPlan::Union(Union { inputs, schema }) => {
let first_input_schema = inputs[0].schema();
- if schema.fields().len() == first_input_schema.fields().len() {
- // If inputs are not pruned do not change schema
+ // Check field count AND field types AND field
names/qualifiers.
+ // A width-only check misses cases where inputs were rewritten
with
+ // different types or aliases (e.g. after type-coercion
rewrites).
+ let schemas_match = schema.fields().len() ==
first_input_schema.fields().len()
+ && (0..schema.fields().len()).all(|i| {
+ let (q1, f1) = schema.qualified_field(i);
+ let (q2, f2) = first_input_schema.qualified_field(i);
+ q1 == q2
+ && f1.data_type() == f2.data_type()
+ && f1.name() == f2.name()
+ });
Review Comment:
The match check only compares the cached `schema` against `inputs[0]`. If a
rewrite changes only a non-first input (or inputs diverge temporarily),
`schemas_match` may stay true and skip recomputation, leaving the `Union`
schema stale/incorrect. Consider validating the cached schema against *each*
input schema (or against the recomputed union schema) before deciding to keep
it.
##########
datafusion/expr/src/logical_plan/plan.rs:
##########
@@ -709,8 +709,20 @@ impl LogicalPlan {
}
LogicalPlan::Union(Union { inputs, schema }) => {
let first_input_schema = inputs[0].schema();
- if schema.fields().len() == first_input_schema.fields().len() {
- // If inputs are not pruned do not change schema
+ // Check field count AND field types AND field
names/qualifiers.
+ // A width-only check misses cases where inputs were rewritten
with
+ // different types or aliases (e.g. after type-coercion
rewrites).
+ let schemas_match = schema.fields().len() ==
first_input_schema.fields().len()
+ && (0..schema.fields().len()).all(|i| {
+ let (q1, f1) = schema.qualified_field(i);
+ let (q2, f2) = first_input_schema.qualified_field(i);
+ q1 == q2
+ && f1.data_type() == f2.data_type()
+ && f1.name() == f2.name()
Review Comment:
The structural equality check ignores other field properties that can affect
correctness (notably nullability, and potentially metadata depending on how
schemas are used). This can still incorrectly treat schemas as matching and
skip recomputation. Prefer a full field equality comparison (or additionally
compare `f1.is_nullable()` and any other required properties) so
`recompute_schema()` reflects changes in nullability/metadata introduced by
rewrites.
##########
datafusion/expr/src/logical_plan/plan.rs:
##########
@@ -709,8 +709,20 @@ impl LogicalPlan {
}
LogicalPlan::Union(Union { inputs, schema }) => {
let first_input_schema = inputs[0].schema();
- if schema.fields().len() == first_input_schema.fields().len() {
- // If inputs are not pruned do not change schema
+ // Check field count AND field types AND field
names/qualifiers.
+ // A width-only check misses cases where inputs were rewritten
with
+ // different types or aliases (e.g. after type-coercion
rewrites).
+ let schemas_match = schema.fields().len() ==
first_input_schema.fields().len()
+ && (0..schema.fields().len()).all(|i| {
Review Comment:
Minor: `schema.fields().len()` is computed multiple times and the
index-based loop makes it harder to see what's being compared. Consider storing
the length in a local (e.g., `let len = ...`) and/or iterating with `zip` over
the corresponding qualified fields to make the comparison clearer and avoid
repeated length queries.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]