kosiew commented on code in PR #20720:
URL: https://github.com/apache/datafusion/pull/20720#discussion_r3079091677
##########
datafusion/sql/src/select.rs:
##########
@@ -690,21 +694,75 @@ impl<S: ContextProvider> SqlToRel<'_, S> {
self.plan_table_with_joins(input, planner_context)
}
_ => {
+ let extract_table_name =
+ |t: &TableWithJoins| -> Option<(String, Option<Span>)> {
+ match &t.relation {
+ TableFactor::Table { alias: Some(a), .. }
+ | TableFactor::Derived { alias: Some(a), .. }
Review Comment:
This `extract_table_name` closure is doing a few different things at once,
like relation name normalization, alias span lookup, and relation kind
filtering.
Once the JOIN coverage is addressed, would it be worth pulling this into a
small helper near relation planning? It might make the naming rules easier to
follow and extend as more table factor variants are added.
##########
datafusion/sql/src/select.rs:
##########
@@ -690,21 +694,75 @@ impl<S: ContextProvider> SqlToRel<'_, S> {
self.plan_table_with_joins(input, planner_context)
Review Comment:
It looks like the duplicate alias diagnostic is only applied in the
multi-entry comma `FROM` path. A single `TableWithJoins` still goes straight to
`plan_table_with_joins`, so explicit joins skip this check.
For example, `(SELECT 1 AS a) AS t JOIN (SELECT 2 AS b) AS t ON true` would
not be caught here. Depending on the columns, this could either fall back to a
schema error or even plan successfully with duplicate `t` aliases.
Could we move the alias and span tracking into the relation or join planning
path so both comma joins and explicit JOINs go through the same validation and
produce consistent diagnostics?
##########
datafusion/sql/src/select.rs:
##########
@@ -690,21 +694,75 @@ impl<S: ContextProvider> SqlToRel<'_, S> {
self.plan_table_with_joins(input, planner_context)
}
_ => {
+ let extract_table_name =
+ |t: &TableWithJoins| -> Option<(String, Option<Span>)> {
+ match &t.relation {
+ TableFactor::Table { alias: Some(a), .. }
+ | TableFactor::Derived { alias: Some(a), .. }
+ | TableFactor::Function { alias: Some(a), .. }
+ | TableFactor::UNNEST { alias: Some(a), .. }
+ | TableFactor::NestedJoin { alias: Some(a), .. }
=> {
+ let span =
Span::try_from_sqlparser_span(a.name.span);
+ let name =
+
self.ident_normalizer.normalize(a.name.clone());
+ Some((name, span))
+ }
+ TableFactor::Table {
+ name, alias: None, ..
+ } => {
+ let span =
+
Span::try_from_sqlparser_span(t.relation.span());
+ let table_name = name
+ .0
+ .iter()
+ .filter_map(|p| p.as_ident())
+ .map(|id|
self.ident_normalizer.normalize(id.clone()))
+ .next_back()?;
+ Some((table_name, span))
+ }
+ _ => None,
+ }
+ };
+
+ let mut alias_spans: HashMap<String, Option<Span>> =
HashMap::new();
Review Comment:
I think this currently extracts only the final identifier for unaliased
table factors. That means something like `catalog1.schema.person` and
`catalog2.schema.person` would both be tracked as `person` and get flagged as
duplicates.
However, DataFusion's scan qualifier uses the full normalized
`TableReference`, so these should actually be treated as distinct.
Would it make sense to reuse
`object_name_to_table_reference(name.clone())?.to_string()` here, or otherwise
preserve the full relation identity for unaliased tables?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]