xudong963 commented on code in PR #21236:
URL: https://github.com/apache/datafusion/pull/21236#discussion_r3008180094


##########
datafusion/expr/src/logical_plan/plan.rs:
##########
@@ -2406,6 +2406,28 @@ impl SubqueryAlias {
         let aliases = unique_field_aliases(plan.schema().fields());
         let is_projection_needed = aliases.iter().any(Option::is_some);
 
+        // Collect the set of unqualified field names that are ambiguous in 
this

Review Comment:
   Here marks a name as ambiguous when `unique_field_aliases` provides a rename 
for it. But `unique_field_aliases` renames *all* duplicates — so if there are 3 
columns named `age`, the 2nd and 3rd get renamed. 
   
   The code collects `field.name()` for each renamed field, which means it 
collects `"age"` twice and puts it in the set. That works correctly due to 
`HashSet` dedup, but the *first* `age` (which was NOT renamed) is also 
ambiguous and only ends up in the set because one of the later duplicates 
shares its name. This is coincidentally correct but fragile — if 
`unique_field_aliases` ever changed to rename ALL duplicates (including the 
first), or if it renamed to something other than `name:N`, the logic could 
break. 🤔 
   
   A cleaner approach: count occurrences of each name and mark any name 
appearing 2+ times.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to