NGA-TRAN commented on code in PR #17478:
URL: https://github.com/apache/datafusion/pull/17478#discussion_r2356371040
##########
datafusion/expr/src/logical_plan/builder.rs:
##########
@@ -2787,4 +2765,33 @@ mod tests {
Ok(())
}
+
+ #[test]
+ fn test_unique_field_aliases() {
+ let t1_field_1 = Field::new("a", DataType::Int32, false);
+ let t2_field_1 = Field::new("a", DataType::Int32, false);
+ let t2_field_3 = Field::new("a", DataType::Int32, false);
+ let t2_field_4 = Field::new("a:1", DataType::Int32, false);
+ let t1_field_2 = Field::new("b", DataType::Int32, false);
+ let t2_field_2 = Field::new("b", DataType::Int32, false);
+
+ let fields = vec![
+ t1_field_1, t2_field_1, t1_field_2, t2_field_2, t2_field_3,
t2_field_4,
+ ];
+ let fields = Fields::from(fields);
+
+ let remove_redundant = unique_field_aliases(&fields);
+
+ assert_eq!(
+ remove_redundant,
+ vec![
+ None,
+ Some("a:1".to_string()),
+ None,
+ Some("b:1".to_string()),
+ Some("a:2".to_string()),
+ Some("a:1:1".to_string()),
+ ]
Review Comment:
`a, a, b, b, a, a:1` becomes `_, a:1, _, b:1, a:2, a:1:1`? Does it mean we
keep the last of the same contiguous values and add 1, 2, 3, ... into the same
but non contiguous?
Can you add this comment on top of the test?
##########
datafusion/expr/src/logical_plan/builder.rs:
##########
@@ -1517,37 +1518,42 @@ impl ValuesFields {
}
}
-// `name_map` tracks a mapping between a field name and the number of
appearances of that field.
-//
-// Some field names might already come to this function with the count (number
of times it appeared)
-// as a suffix e.g. id:1, so there's still a chance of name collisions, for
example,
-// if these three fields passed to this function: "col:1", "col" and "col",
the function
-// would rename them to -> col:1, col, col:1 causing a posteriror error when
building the DFSchema.
-// that's why we need the `seen` set, so the fields are always unique.
-//
-pub fn change_redundant_column(fields: &Fields) -> Vec<Field> {
- let mut name_map = HashMap::new();
- let mut seen: HashSet<String> = HashSet::new();
+// Return a list of aliases so that if applied to `fields` it would result in
a unique name for each
+// column, regardless of qualification. The returned vector length is equal to
the number of fields
+// as input and will optionally contain the alias that needs to be assigned to
the column in the
+// same position in order to maintain the uniqueness property.
Review Comment:
Can you add example in this comment? Something like you use in the test
below:
`a, a, b, b, a, a:1` becomes `_, a:1, _, b:1, a:2, a:1:1`
It is also good to explain why you do this. Maybe another more general
example for it?
##########
datafusion/core/src/physical_planner.rs:
##########
@@ -3562,4 +3448,44 @@ digraph {
Ok(())
}
+
+ #[tokio::test]
+ async fn subquery_alias_confusing_the_optimizer() -> Result<()> {
Review Comment:
```suggestion
// Reproducer for DataFusion issue #17405:
//
// The following SQL is semantically invalid. Notably, the `SELECT
left_table.a, right_table.a` clause is missing from the explicit logical plan:
//
// SELECT a FROM (
// -- SELECT left_table.a, right_table.a
// FROM left_table
// FULL JOIN right_table ON left_table.a = right_table.a
// ) AS alias
// GROUP BY a;
//
// As a result, the variables within `alias` subquery are not properly
distinguished, which leads to a bug when converting the logical plan into a
physical plan.
//
// The proposed fix is to implicitly insert a ProjectionExec node to
represent the missing SELECT clause.
async fn subquery_alias_confusing_the_optimizer() -> Result<()> {
```
##########
datafusion/core/src/physical_planner.rs:
##########
@@ -3562,4 +3448,44 @@ digraph {
Ok(())
}
+
+ #[tokio::test]
Review Comment:
For other reviewers, this is a reproducer of the bug. I have run this test
before the fix and it fails. Great job to reproduce this with logical and
physical plan @notfilippo.
I suggest you add the below comment so we know what the bug is
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]