kosiew opened a new pull request, #21617:
URL: https://github.com/apache/datafusion/pull/21617
## Which issue does this PR close?
* Part of #21616
---
## Rationale for this change
The SQL planner previously performed duplicate relation alias validation
inconsistently across different code paths. In particular:
* Explicit joins could bypass duplicate alias checks entirely
* Unaliased tables could collide due to simplistic key generation
* Error diagnostics lacked source location information, making debugging
difficult
This led to confusing or missing errors for users and inconsistent behavior
depending on query structure.
This PR introduces a centralized relation-scope binding mechanism within the
planner to ensure consistent validation and improved diagnostics.
---
## What changes are included in this PR?
* Introduced `RelationScope` and `RelationBinding` to track relation names
and aliases within scoped FROM clauses
* Added relation scope management to `PlannerContext`, including:
* `with_new_relation_scope`
* `clear_relation_scopes`
* `insert_relation_binding`
* Implemented duplicate relation detection using scoped bindings with
span-aware diagnostics
* Registered relation bindings during planning for:
* Base tables
* Aliases
* Joins (including explicit joins)
* Ensured nested joins and subqueries do not leak relation scopes
* Updated query planning to reset relation scopes for subqueries
* Refactored nested join handling to respect scope boundaries
* Added detailed diagnostics with source span references for duplicate alias
errors
---
## Are these changes tested?
Yes. This PR includes comprehensive tests covering:
* Duplicate aliases in explicit joins
* Duplicate aliases in comma joins
* Conflicts between unaliased table names and aliases
* Valid queries with distinct aliases
* Fully-qualified table names across schemas (ensuring no false positives)
* Scope isolation for subqueries and nested joins
These tests ensure both correctness and regression coverage for the new
behavior.
---
## Are there any user-facing changes?
Yes.
* Queries with duplicate relation aliases (previously allowed in some cases)
will now correctly return planning errors
* Error messages are improved and now include more precise diagnostics,
including source locations where available
There are no breaking API changes, but stricter validation may cause
previously accepted invalid queries to fail.
---
## LLM-generated code disclosure
This PR includes LLM-generated code and comments. All LLM-generated content
has been manually reviewed and tested.
---
## Additional Notes
This change lays the groundwork for further centralization of name
resolution and validation logic in the SQL planner, potentially extending to
CTEs and other relation sources in future work.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]