NGA-TRAN commented on code in PR #17478:
URL: https://github.com/apache/datafusion/pull/17478#discussion_r2356371040


##########
datafusion/expr/src/logical_plan/builder.rs:
##########
@@ -2787,4 +2765,33 @@ mod tests {
 
         Ok(())
     }
+
+    #[test]
+    fn test_unique_field_aliases() {
+        let t1_field_1 = Field::new("a", DataType::Int32, false);
+        let t2_field_1 = Field::new("a", DataType::Int32, false);
+        let t2_field_3 = Field::new("a", DataType::Int32, false);
+        let t2_field_4 = Field::new("a:1", DataType::Int32, false);
+        let t1_field_2 = Field::new("b", DataType::Int32, false);
+        let t2_field_2 = Field::new("b", DataType::Int32, false);
+
+        let fields = vec![
+            t1_field_1, t2_field_1, t1_field_2, t2_field_2, t2_field_3, 
t2_field_4,
+        ];
+        let fields = Fields::from(fields);
+
+        let remove_redundant = unique_field_aliases(&fields);
+
+        assert_eq!(
+            remove_redundant,
+            vec![
+                None,
+                Some("a:1".to_string()),
+                None,
+                Some("b:1".to_string()),
+                Some("a:2".to_string()),
+                Some("a:1:1".to_string()),
+            ]

Review Comment:
   `a, a, b, b, a, a:1` becomes `_, a:1, _, b:1, a:2, a:1:1`?  Does it mean  we 
keep the last of the same contiguous values and add 1, 2, 3, ... into the same 
but non contiguous?
   Can you add this comment on top of the test?



##########
datafusion/expr/src/logical_plan/builder.rs:
##########
@@ -1517,37 +1518,42 @@ impl ValuesFields {
     }
 }
 
-// `name_map` tracks a mapping between a field name and the number of 
appearances of that field.
-//
-// Some field names might already come to this function with the count (number 
of times it appeared)
-// as a suffix e.g. id:1, so there's still a chance of name collisions, for 
example,
-// if these three fields passed to this function: "col:1", "col" and "col", 
the function
-// would rename them to -> col:1, col, col:1 causing a posteriror error when 
building the DFSchema.
-// that's why we need the `seen` set, so the fields are always unique.
-//
-pub fn change_redundant_column(fields: &Fields) -> Vec<Field> {
-    let mut name_map = HashMap::new();
-    let mut seen: HashSet<String> = HashSet::new();
+// Return a list of aliases so that if applied to `fields` it would result in 
a unique name for each
+// column, regardless of qualification. The returned vector length is equal to 
the number of fields
+// as input and will optionally contain the alias that needs to be assigned to 
the column in the
+// same position in order to maintain the uniqueness property.

Review Comment:
   Can you add example in this comment? Something like you use in the test 
below:
   `a, a, b, b, a, a:1` becomes `_, a:1, _, b:1, a:2, a:1:1`
   
   It is also good to explain why you do this. Maybe another more general 
example for it?



##########
datafusion/core/src/physical_planner.rs:
##########
@@ -3562,4 +3448,44 @@ digraph {
 
         Ok(())
     }
+
+    #[tokio::test]
+    async fn subquery_alias_confusing_the_optimizer() -> Result<()> {

Review Comment:
   ```suggestion
       // Reproducer for DataFusion issue #17405:
       // 
       // The following SQL is semantically invalid. Notably, the `SELECT 
left_table.a, right_table.a` clause is missing from the explicit logical plan:
      //  
      // SELECT a FROM (
      //       -- SELECT left_table.a, right_table.a
      //       FROM left_table
      //       FULL JOIN right_table ON left_table.a = right_table.a
       // ) AS alias
       // GROUP BY a;
       // 
       // As a result, the variables within `alias` subquery are not properly 
distinguished, which leads to a bug when converting the logical plan into a 
physical plan.
      // 
      // The proposed fix is to implicitly insert a ProjectionExec node to 
represent the missing SELECT clause.
      
       async fn subquery_alias_confusing_the_optimizer() -> Result<()> {
   ```



##########
datafusion/core/src/physical_planner.rs:
##########
@@ -3562,4 +3448,44 @@ digraph {
 
         Ok(())
     }
+
+    #[tokio::test]

Review Comment:
   For other reviewers, this is a reproducer of the bug. I have run this test 
before the fix and it fails. Great job to reproduce this with logical and 
physical plan @notfilippo.
   
   I suggest you add the below comment so we know what the bug is



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to