notfilippo opened a new pull request, #17478:
URL: https://github.com/apache/datafusion/pull/17478

   ## Which issue does this PR close?
   
   - Closes #17405.
   
   ## Rationale for this change
   
   When creating nested `SubqueryAlias` operations in complex joins, DataFusion 
was incorrectly handling column name conflicts by appending suffixes like `:1` 
to duplicate column names. This caused the physical planner to fail with "Input 
field name {} does not match with the projection expression {}" errors, as the 
optimizer couldn't properly match columns with these modified names.
   
   The root cause was that the `SubqueryAlias` creation process was stripping 
qualification information and mixing columns from left and right sides of 
joins, leading to name collisions that were resolved by adding numeric 
suffixes. This approach lost important context needed for proper column 
resolution.
   
   ## What changes are included in this PR?
   
   - Replaced the hacky column renaming approach in `SubqueryAlias` with a 
projection-based solution
   - Added `maybe_project_redundant_column` function that creates explicit 
projections with aliases when needed, instead of modifying column names directly
   - Removed the `maybe_fix_physical_column_name` function from the physical 
planner that was attempting to fix these naming issues downstream
   - Updated `SubqueryAlias::try_new` to use the new projection approach, 
preserving qualification information properly
   - Added test case demonstrating the fix for nested subquery alias scenarios
   
   ## Are these changes tested?
   
   The changes include a new test case `subquery_alias_confusing_the_optimizer` 
that reproduces the original issue and verifies the fix works correctly. 
**Note: The newly added function `maybe_project_redundant_column` is missing 
comprehensive tests.**
   
   ## Are there any user-facing changes?
   
   No user-facing changes. This is an internal fix that resolves query planning 
errors for complex nested join scenarios without changing the public API or 
query behavior.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to