notfilippo opened a new pull request, #17478:
URL: https://github.com/apache/datafusion/pull/17478
## Which issue does this PR close?
- Closes #17405.
## Rationale for this change
When creating nested `SubqueryAlias` operations in complex joins, DataFusion
was incorrectly handling column name conflicts by appending suffixes like `:1`
to duplicate column names. This caused the physical planner to fail with "Input
field name {} does not match with the projection expression {}" errors, as the
optimizer couldn't properly match columns with these modified names.
The root cause was that the `SubqueryAlias` creation process was stripping
qualification information and mixing columns from left and right sides of
joins, leading to name collisions that were resolved by adding numeric
suffixes. This approach lost important context needed for proper column
resolution.
## What changes are included in this PR?
- Replaced the hacky column renaming approach in `SubqueryAlias` with a
projection-based solution
- Added `maybe_project_redundant_column` function that creates explicit
projections with aliases when needed, instead of modifying column names directly
- Removed the `maybe_fix_physical_column_name` function from the physical
planner that was attempting to fix these naming issues downstream
- Updated `SubqueryAlias::try_new` to use the new projection approach,
preserving qualification information properly
- Added test case demonstrating the fix for nested subquery alias scenarios
## Are these changes tested?
The changes include a new test case `subquery_alias_confusing_the_optimizer`
that reproduces the original issue and verifies the fix works correctly.
**Note: The newly added function `maybe_project_redundant_column` is missing
comprehensive tests.**
## Are there any user-facing changes?
No user-facing changes. This is an internal fix that resolves query planning
errors for complex nested join scenarios without changing the public API or
query behavior.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]