xiedeyantu opened a new pull request, #21236:
URL: https://github.com/apache/datafusion/pull/21236

   ## Which issue does this PR close?
   
   - Closes #21232.
   
   ## Rationale for this change
   
   When two joined tables share a column with the same name (e.g. `age`), a 
`SELECT *` inside a derived table subquery produces duplicate column names. 
Previously, referencing such a column by its unqualified name from the outer 
query silently succeeded instead of raising an ambiguity error, violating 
standard SQL semantics.
   
   ## What changes are included in this PR?
   
   - Added an `ambiguous_names: HashSet<String>` field to `DFSchema` to track 
column names that are structurally ambiguous in a given schema context.
   - Added `DFSchema::with_ambiguous_names` (builder) and 
`DFSchema::ambiguous_names` (accessor) methods.
   - In `SubqueryAlias::try_new`, after `unique_field_aliases` renames 
duplicate columns to keep the Arrow schema valid, the original (pre-rename) 
names are collected into `ambiguous_names` and attached to the output schema.
   - In `DFSchema::qualified_field_with_unqualified_name`, any lookup of an 
ambiguous name now immediately returns `SchemaError::AmbiguousReference`.
   - In `Column::normalize_with_schemas_and_ambiguity_check`, even a single 
structural match is rejected when the containing schema has flagged the name as 
ambiguous.
   - Updated the `bad_extension_planner` snapshot test to include the new 
`ambiguous_names` field in the `DFSchema` debug output.
   
   ## Are these changes tested?
   
   The existing `join_with_ambiguous_column`, `order_by_ambiguous_name`, and 
`group_by_ambiguous_name` tests continue to pass. A new test case covering the 
reported scenario (`select age from (SELECT * FROM a join b on a.aid = b.bid) 
as t`) should be added to `datafusion/sql/tests/sql_integration.rs`.
   
   ## Are there any user-facing changes?
   
   Yes. Queries that previously silently resolved an ambiguous column reference 
through a derived-table subquery will now receive a `Schema error: Ambiguous 
reference to unqualified field <name>` error, consistent with standard SQL 
behavior and with how DataFusion already handles the same ambiguity at the 
direct JOIN level.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to