Blizzara opened a new pull request, #11329:
URL: https://github.com/apache/datafusion/pull/11329

   ## Which issue does this PR close?
   
   Related to https://github.com/apache/datafusion/issues/10815, and follows up 
from https://github.com/apache/datafusion/pull/11049#issuecomment-2185795038. 
There may still be some cases left uncovered, but this does handle some more.
   
   Closes #.
   
   ## Rationale for this change
   
   As DF requires column name uniqueness while Substrait doesn't care about 
names at all (apart from first & last levels of the plan), that causes clashes 
in the intermediate nodes that prevent DF from executing plans. 
   
   A simple example of a plan that would not work is:
   ```
   SELECT a + 1 as sum_a, a + 1 as sum_a_2 FROM data
   ```
   
   This fails as Substrait consumer first creates a project to add the columns, 
and only then creates the final project (or modifies the existing project) to 
rename them. But when the first project is created, it has two identical 
expressions without aliases, meaning their names are collide and 
`validate_unique_names()` fails.
   
   ## What changes are included in this PR?
   
   When consuming Substrait `Project`s, rename expressions as required to 
prevent duplicates.
   
   ## Are these changes tested?
   
   Added unit test + existing tests
   
   <!--
   We typically require tests for all PRs in order to:
   1. Prevent the code from being accidentally broken by subsequent changes
   2. Serve as another way to document the expected behavior of the code
   
   If tests are not included in your PR, please explain why (for example, are 
they covered by existing tests)?
   -->
   
   ## Are there any user-facing changes?
   
   <!--
   If there are user-facing changes then we may require documentation to be 
updated before approving the PR.
   -->
   
   <!--
   If there are any breaking changes to public APIs, please add the `api 
change` label.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to