xiedeyantu opened a new pull request, #21236: URL: https://github.com/apache/datafusion/pull/21236
## Which issue does this PR close? - Closes #21232. ## Rationale for this change When two joined tables share a column with the same name (e.g. `age`), a `SELECT *` inside a derived table subquery produces duplicate column names. Previously, referencing such a column by its unqualified name from the outer query silently succeeded instead of raising an ambiguity error, violating standard SQL semantics. ## What changes are included in this PR? - Added an `ambiguous_names: HashSet<String>` field to `DFSchema` to track column names that are structurally ambiguous in a given schema context. - Added `DFSchema::with_ambiguous_names` (builder) and `DFSchema::ambiguous_names` (accessor) methods. - In `SubqueryAlias::try_new`, after `unique_field_aliases` renames duplicate columns to keep the Arrow schema valid, the original (pre-rename) names are collected into `ambiguous_names` and attached to the output schema. - In `DFSchema::qualified_field_with_unqualified_name`, any lookup of an ambiguous name now immediately returns `SchemaError::AmbiguousReference`. - In `Column::normalize_with_schemas_and_ambiguity_check`, even a single structural match is rejected when the containing schema has flagged the name as ambiguous. - Updated the `bad_extension_planner` snapshot test to include the new `ambiguous_names` field in the `DFSchema` debug output. ## Are these changes tested? The existing `join_with_ambiguous_column`, `order_by_ambiguous_name`, and `group_by_ambiguous_name` tests continue to pass. A new test case covering the reported scenario (`select age from (SELECT * FROM a join b on a.aid = b.bid) as t`) should be added to `datafusion/sql/tests/sql_integration.rs`. ## Are there any user-facing changes? Yes. Queries that previously silently resolved an ambiguous column reference through a derived-table subquery will now receive a `Schema error: Ambiguous reference to unqualified field <name>` error, consistent with standard SQL behavior and with how DataFusion already handles the same ambiguity at the direct JOIN level. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
