kosiew commented on code in PR #20720:
URL: https://github.com/apache/datafusion/pull/20720#discussion_r3079091677


##########
datafusion/sql/src/select.rs:
##########
@@ -690,21 +694,75 @@ impl<S: ContextProvider> SqlToRel<'_, S> {
                 self.plan_table_with_joins(input, planner_context)
             }
             _ => {
+                let extract_table_name =
+                    |t: &TableWithJoins| -> Option<(String, Option<Span>)> {
+                        match &t.relation {
+                            TableFactor::Table { alias: Some(a), .. }
+                            | TableFactor::Derived { alias: Some(a), .. }

Review Comment:
   This `extract_table_name` closure is doing a few different things at once, 
like relation name normalization, alias span lookup, and relation kind 
filtering.
   
   Once the JOIN coverage is addressed, would it be worth pulling this into a 
small helper near relation planning? It might make the naming rules easier to 
follow and extend as more table factor variants are added.



##########
datafusion/sql/src/select.rs:
##########
@@ -690,21 +694,75 @@ impl<S: ContextProvider> SqlToRel<'_, S> {
                 self.plan_table_with_joins(input, planner_context)

Review Comment:
   It looks like the duplicate alias diagnostic is only applied in the 
multi-entry comma `FROM` path. A single `TableWithJoins` still goes straight to 
`plan_table_with_joins`, so explicit joins skip this check.
   
   For example, `(SELECT 1 AS a) AS t JOIN (SELECT 2 AS b) AS t ON true` would 
not be caught here. Depending on the columns, this could either fall back to a 
schema error or even plan successfully with duplicate `t` aliases.
   
   Could we move the alias and span tracking into the relation or join planning 
path so both comma joins and explicit JOINs go through the same validation and 
produce consistent diagnostics?



##########
datafusion/sql/src/select.rs:
##########
@@ -690,21 +694,75 @@ impl<S: ContextProvider> SqlToRel<'_, S> {
                 self.plan_table_with_joins(input, planner_context)
             }
             _ => {
+                let extract_table_name =
+                    |t: &TableWithJoins| -> Option<(String, Option<Span>)> {
+                        match &t.relation {
+                            TableFactor::Table { alias: Some(a), .. }
+                            | TableFactor::Derived { alias: Some(a), .. }
+                            | TableFactor::Function { alias: Some(a), .. }
+                            | TableFactor::UNNEST { alias: Some(a), .. }
+                            | TableFactor::NestedJoin { alias: Some(a), .. } 
=> {
+                                let span = 
Span::try_from_sqlparser_span(a.name.span);
+                                let name =
+                                    
self.ident_normalizer.normalize(a.name.clone());
+                                Some((name, span))
+                            }
+                            TableFactor::Table {
+                                name, alias: None, ..
+                            } => {
+                                let span =
+                                    
Span::try_from_sqlparser_span(t.relation.span());
+                                let table_name = name
+                                    .0
+                                    .iter()
+                                    .filter_map(|p| p.as_ident())
+                                    .map(|id| 
self.ident_normalizer.normalize(id.clone()))
+                                    .next_back()?;
+                                Some((table_name, span))
+                            }
+                            _ => None,
+                        }
+                    };
+
+                let mut alias_spans: HashMap<String, Option<Span>> = 
HashMap::new();

Review Comment:
   I think this currently extracts only the final identifier for unaliased 
table factors. That means something like `catalog1.schema.person` and 
`catalog2.schema.person` would both be tracked as `person` and get flagged as 
duplicates.
   
   However, DataFusion's scan qualifier uses the full normalized 
`TableReference`, so these should actually be treated as distinct.
   
   Would it make sense to reuse 
`object_name_to_table_reference(name.clone())?.to_string()` here, or otherwise 
preserve the full relation identity for unaliased tables?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to