kosiew opened a new pull request, #21617:
URL: https://github.com/apache/datafusion/pull/21617

   ## Which issue does this PR close?
   
   * Part of #21616
   
   ---
   
   ## Rationale for this change
   
   The SQL planner previously performed duplicate relation alias validation 
inconsistently across different code paths. In particular:
   
   * Explicit joins could bypass duplicate alias checks entirely
   * Unaliased tables could collide due to simplistic key generation
   * Error diagnostics lacked source location information, making debugging 
difficult
   
   This led to confusing or missing errors for users and inconsistent behavior 
depending on query structure.
   
   This PR introduces a centralized relation-scope binding mechanism within the 
planner to ensure consistent validation and improved diagnostics.
   
   ---
   
   ## What changes are included in this PR?
   
   * Introduced `RelationScope` and `RelationBinding` to track relation names 
and aliases within scoped FROM clauses
   * Added relation scope management to `PlannerContext`, including:
   
     * `with_new_relation_scope`
     * `clear_relation_scopes`
     * `insert_relation_binding`
   * Implemented duplicate relation detection using scoped bindings with 
span-aware diagnostics
   * Registered relation bindings during planning for:
   
     * Base tables
     * Aliases
     * Joins (including explicit joins)
   * Ensured nested joins and subqueries do not leak relation scopes
   * Updated query planning to reset relation scopes for subqueries
   * Refactored nested join handling to respect scope boundaries
   * Added detailed diagnostics with source span references for duplicate alias 
errors
   
   ---
   
   ## Are these changes tested?
   
   Yes. This PR includes comprehensive tests covering:
   
   * Duplicate aliases in explicit joins
   * Duplicate aliases in comma joins
   * Conflicts between unaliased table names and aliases
   * Valid queries with distinct aliases
   * Fully-qualified table names across schemas (ensuring no false positives)
   * Scope isolation for subqueries and nested joins
   
   These tests ensure both correctness and regression coverage for the new 
behavior.
   
   ---
   
   ## Are there any user-facing changes?
   
   Yes.
   
   * Queries with duplicate relation aliases (previously allowed in some cases) 
will now correctly return planning errors
   * Error messages are improved and now include more precise diagnostics, 
including source locations where available
   
   There are no breaking API changes, but stricter validation may cause 
previously accepted invalid queries to fail.
   
   ---
   
   ## LLM-generated code disclosure
   
   This PR includes LLM-generated code and comments. All LLM-generated content 
has been manually reviewed and tested.
   
   ---
   
   ## Additional Notes
   
   This change lays the groundwork for further centralization of name 
resolution and validation logic in the SQL planner, potentially extending to 
CTEs and other relation sources in future work.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to