neilconway commented on code in PR #20990:
URL: https://github.com/apache/datafusion/pull/20990#discussion_r2997303222
##########
datafusion/sql/tests/sql_integration.rs:
##########
@@ -4847,6 +4847,40 @@ fn test_using_join_wildcard_schema() {
);
}
+#[test]
+fn test_using_join_wildcard_schema_semi_anti() {
+ let left_expected = vec!["s.x1".to_string(), "s.x2".to_string(),
"s.x3".to_string()];
+ let right_expected = vec!["t.x1".to_string(), "t.x2".to_string(),
"t.x3".to_string()];
+
+ let sql = "WITH
+ s AS (SELECT 1 AS x1, 2 AS x2, 3 AS x3),
+ t AS (SELECT 1 AS x1, 4 AS x2, 5 AS x3)
+ SELECT * FROM s LEFT SEMI JOIN t USING (x1)";
Review Comment:
This test (and the other `LEFT ANTI JOIN` test) should succeed without this
PR, right? Because the correct side of the join happens to be picked. Can we
change this (or add a new test variant) that checks that the behavior has
changed for `LEFT` joins?
##########
datafusion/expr/src/utils.rs:
##########
Review Comment:
It took me a while to wrap my head around the logic here; what do you think
of this instead?
```suggestion
/// When a JOIN has a USING clause, the join columns appear in the output
/// schema once per side (for inner/outer joins) or once total (for
semi/anti
/// joins). An unqualified wildcard should include each USING column only
once.
/// This function returns the duplicate columns that should be excluded.
fn exclude_using_columns(plan: &LogicalPlan) -> Result<HashSet<Column>> {
let output_columns: HashSet<_> =
plan.schema().columns().iter().cloned().collect();
let mut excluded = HashSet::new();
for cols in plan.using_columns()? {
// `using_columns()` returns join columns from both sides
regardless
// of join type. For semi/anti joins, only one side's columns
appear
// in the output schema. Filter to output columns so that columns
// from the non-output side don't participate in deduplication and
// displace real output columns.
let mut cols: Vec<_> = cols
.into_iter()
.filter(|c| output_columns.contains(c))
.collect();
// Sort so we keep the same qualified column, regardless of HashSet
// iteration order.
cols.sort();
let mut seen = HashSet::new();
for col in cols {
if !seen.insert(col.name.clone()) {
excluded.insert(col);
}
}
}
Ok(excluded)
}
```
##########
datafusion/sql/tests/sql_integration.rs:
##########
@@ -4847,6 +4847,40 @@ fn test_using_join_wildcard_schema() {
);
}
+#[test]
+fn test_using_join_wildcard_schema_semi_anti() {
+ let left_expected = vec!["s.x1".to_string(), "s.x2".to_string(),
"s.x3".to_string()];
+ let right_expected = vec!["t.x1".to_string(), "t.x2".to_string(),
"t.x3".to_string()];
Review Comment:
```suggestion
let left_expected = &["s.x1", "s.x2", "s.x3"];
let right_expected = &["t.x1", "t.x2", "t.x3"];
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]