xanderbailey commented on code in PR #17299:
URL: https://github.com/apache/datafusion/pull/17299#discussion_r2323660433
##########
datafusion/substrait/src/logical_plan/consumer/rel/project_rel.rs:
##########
@@ -62,7 +62,17 @@ pub async fn from_project_rel(
// to transform it into a column reference
window_exprs.insert(e.clone());
}
- explicit_exprs.push(name_tracker.get_uniquely_named_expr(e)?);
+ // Since substrait removes aliases, we need to assign literals
with a UUID alias to avoid
+ // ambiguous names when the same literal is used before and after
a join.
+ // The name tracker will ensure that two literals in the same
project would have
+ // unique names but, it does not ensure that if a literal column
exists in a previous
+ // project say before a join that it is deduplicated with respect
to those columns.
Review Comment:
Yes so the name tracker makes sure that if you have a project that creates
two null string columns, it will create two unique names for those two columns.
But, say you create one of those null columns before a join and then another in
a project immediately after a join, the plan fails with an ambiguous column
error because there is a UTF8(NULL) from say the left and then another
UTF8(NULL) from the project after the join which has no source and it's
therefore an ambiguous reference.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]