Copilot commented on code in PR #22640:
URL: https://github.com/apache/datafusion/pull/22640#discussion_r3328367883


##########
datafusion/expr/src/logical_plan/plan.rs:
##########
@@ -709,8 +709,20 @@ impl LogicalPlan {
             }
             LogicalPlan::Union(Union { inputs, schema }) => {
                 let first_input_schema = inputs[0].schema();
-                if schema.fields().len() == first_input_schema.fields().len() {
-                    // If inputs are not pruned do not change schema
+                // Check field count AND field types AND field 
names/qualifiers.
+                // A width-only check misses cases where inputs were rewritten 
with
+                // different types or aliases (e.g. after type-coercion 
rewrites).
+                let schemas_match = schema.fields().len() == 
first_input_schema.fields().len()
+                    && (0..schema.fields().len()).all(|i| {
+                        let (q1, f1) = schema.qualified_field(i);
+                        let (q2, f2) = first_input_schema.qualified_field(i);
+                        q1 == q2
+                            && f1.data_type() == f2.data_type()
+                            && f1.name() == f2.name()
+                    });

Review Comment:
   The match check only compares the cached `schema` against `inputs[0]`. If a 
rewrite changes only a non-first input (or inputs diverge temporarily), 
`schemas_match` may stay true and skip recomputation, leaving the `Union` 
schema stale/incorrect. Consider validating the cached schema against *each* 
input schema (or against the recomputed union schema) before deciding to keep 
it.



##########
datafusion/expr/src/logical_plan/plan.rs:
##########
@@ -709,8 +709,20 @@ impl LogicalPlan {
             }
             LogicalPlan::Union(Union { inputs, schema }) => {
                 let first_input_schema = inputs[0].schema();
-                if schema.fields().len() == first_input_schema.fields().len() {
-                    // If inputs are not pruned do not change schema
+                // Check field count AND field types AND field 
names/qualifiers.
+                // A width-only check misses cases where inputs were rewritten 
with
+                // different types or aliases (e.g. after type-coercion 
rewrites).
+                let schemas_match = schema.fields().len() == 
first_input_schema.fields().len()
+                    && (0..schema.fields().len()).all(|i| {
+                        let (q1, f1) = schema.qualified_field(i);
+                        let (q2, f2) = first_input_schema.qualified_field(i);
+                        q1 == q2
+                            && f1.data_type() == f2.data_type()
+                            && f1.name() == f2.name()

Review Comment:
   The structural equality check ignores other field properties that can affect 
correctness (notably nullability, and potentially metadata depending on how 
schemas are used). This can still incorrectly treat schemas as matching and 
skip recomputation. Prefer a full field equality comparison (or additionally 
compare `f1.is_nullable()` and any other required properties) so 
`recompute_schema()` reflects changes in nullability/metadata introduced by 
rewrites.



##########
datafusion/expr/src/logical_plan/plan.rs:
##########
@@ -709,8 +709,20 @@ impl LogicalPlan {
             }
             LogicalPlan::Union(Union { inputs, schema }) => {
                 let first_input_schema = inputs[0].schema();
-                if schema.fields().len() == first_input_schema.fields().len() {
-                    // If inputs are not pruned do not change schema
+                // Check field count AND field types AND field 
names/qualifiers.
+                // A width-only check misses cases where inputs were rewritten 
with
+                // different types or aliases (e.g. after type-coercion 
rewrites).
+                let schemas_match = schema.fields().len() == 
first_input_schema.fields().len()
+                    && (0..schema.fields().len()).all(|i| {

Review Comment:
   Minor: `schema.fields().len()` is computed multiple times and the 
index-based loop makes it harder to see what's being compared. Consider storing 
the length in a local (e.g., `let len = ...`) and/or iterating with `zip` over 
the corresponding qualified fields to make the comparison clearer and avoid 
repeated length queries.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to