Jefffrey commented on code in PR #4840:
URL: https://github.com/apache/arrow-datafusion/pull/4840#discussion_r1063978094
##########
datafusion/expr/src/utils.rs:
##########
@@ -150,13 +150,23 @@ pub fn expand_wildcard(schema: &DFSchema, plan:
&LogicalPlan) -> Result<Vec<Expr
let using_columns = plan.using_columns()?;
let columns_to_skip = using_columns
.into_iter()
- // For each USING JOIN condition, only expand to one column in
projection
+ // For each USING JOIN condition, only expand to one of each join
column in projection
.flat_map(|cols| {
let mut cols = cols.into_iter().collect::<Vec<_>>();
// sort join columns to make sure we consistently keep the same
// qualified column
cols.sort();
- cols.into_iter().skip(1)
+ let mut out_column_names: HashSet<String> = HashSet::new();
+ cols.into_iter()
+ .filter_map(|c| {
+ if out_column_names.contains(&c.name) {
+ Some(c)
+ } else {
+ out_column_names.insert(c.name);
+ None
+ }
+ })
+ .collect::<Vec<_>>()
Review Comment:
main fix is here, since instead of only skipping the first column (which is
based on assumption of using join with only one column), actually keep track of
which columns to skip, allowing only one set of the join columns to be output
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]