[GitHub] [arrow-datafusion] Jefffrey commented on a diff in pull request #4840: Support wildcard select on multiple column using joins

GitBox Sat, 07 Jan 2023 00:55:39 -0800


Jefffrey commented on code in PR #4840:
URL: https://github.com/apache/arrow-datafusion/pull/4840#discussion_r1063978094



##########
datafusion/expr/src/utils.rs:
##########
@@ -150,13 +150,23 @@ pub fn expand_wildcard(schema: &DFSchema, plan: 
&LogicalPlan) -> Result<Vec<Expr
     let using_columns = plan.using_columns()?;
     let columns_to_skip = using_columns
         .into_iter()
-        // For each USING JOIN condition, only expand to one column in 
projection
+        // For each USING JOIN condition, only expand to one of each join 
column in projection
         .flat_map(|cols| {
             let mut cols = cols.into_iter().collect::<Vec<_>>();
             // sort join columns to make sure we consistently keep the same
             // qualified column
             cols.sort();
-            cols.into_iter().skip(1)
+            let mut out_column_names: HashSet<String> = HashSet::new();
+            cols.into_iter()
+                .filter_map(|c| {
+                    if out_column_names.contains(&c.name) {
+                        Some(c)
+                    } else {
+                        out_column_names.insert(c.name);
+                        None
+                    }
+                })
+                .collect::<Vec<_>>()

Review Comment:
   main fix is here, since instead of only skipping the first column (which is 
based on assumption of using join with only one column), actually keep track of 
which columns to skip, allowing only one set of the join columns to be output



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] Jefffrey commented on a diff in pull request #4840: Support wildcard select on multiple column using joins

Reply via email to