[
https://issues.apache.org/jira/browse/CALCITE-7453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18068876#comment-18068876
]
Nirmal Govindaraj commented on CALCITE-7453:
--------------------------------------------
I only tested the behaviour on Calcite and Postgres before raising this issue,
I have now tested across PostgreSQL 16, MySQL 8.4, MariaDB 11.8, and SQLite
3.51.0 and Calcite. Here are the queries I tested and their results
# SELECT * FROM (SELECT deptno AS num, empno AS num FROM emp) AS t
## Calcite - fails
## MySQL - fails
## MariaDB - fails
## Postgres - Succeeds with headers num, num
## SQLite - Succeeds with headers num, num:1
# SELECT deptno FROM (SELECT deptno, empno as num, empno as num FROM emp) AS t
## Calcite - succeeds
## MySQL - fails
## MariaDB - fails
## Postgres - Succeeds
## SQLite - Succeeds
# SELECT deptno FROM (SELECT empno as deptno, e.* FROM emp e) AS t
## Calcite - succeeds with header deptno and column empno
## MySQL - fails
## MariaDB - fails
## Postgres - fails
## SQLite - succeeds with header deptno and column empno
# SELECT deptno FROM dept d join emp e on d.deptno = e.deptno
## Calcite - fails
## MySQL - fails
## MariaDB - fails
## Postgres - fails
## SQLite - fails
# SELECT deptno FROM (SELECT * FROM dept d join emp e on d.deptno = e.deptno)
as t
## Calcite - Suceeds with header deptno and depts deptno column
## MySQL - fails
## MariaDB - fails
## Postgres - fails
## SQLite - Suceeds with header deptno and depts deptno column
Some conclusions we can draw from this
# PostgreSQL, MySQL, and MariaDB are internally consistent, but they choose
different policies.
# PostgreSQL is permissive: duplicate output names are allowed in a subquery,
but later references to those names fail when ambiguous.
# MySQL and MariaDB are stricter: a subquery in FROM is treated as a derived
table, and derived tables must expose unique column names.
# SQLite is permissive in a different way: it accepts duplicate output names,
but silently uniquifies them (num:1, etc.), which can make later references
succeed where PostgreSQL would report ambiguity.
# Calcite currently appears to mix these models. In particular, it does not
consistently reject duplicate output names in derived tables like
MySQL/MariaDB, but it also does not consistently preserve them and defer
ambiguity checking like PostgreSQL. The inconsistency seems especially visible
when duplicate names are introduced through * expansion, where Calcite behaves
closer to SQLite by uniquifying names.
I think query (2) is a useful non-* example showing that Calcite does not
currently enforce MySQL-style “derived table columns must be unique” semantics.
Queries (3) and (5) are useful because they show a second inconsistency: when
duplicate names arise through * expansion, Calcite may silently disambiguate
them and allow outer references, which is closer to SQLite than to either
PostgreSQL or MySQL/MariaDB.
The PR I raised brings Postgres behaviour into calcite
> Preserve duplicate output names through star expansion and report resulting
> ambiguity correctly
> -----------------------------------------------------------------------------------------------
>
> Key: CALCITE-7453
> URL: https://issues.apache.org/jira/browse/CALCITE-7453
> Project: Calcite
> Issue Type: Bug
> Affects Versions: 1.42.0
> Reporter: Nirmal Govindaraj
> Assignee: Nirmal Govindaraj
> Priority: Major
> Labels: pull-request-available
>
> h2. Issue
> Two queries exposed the same underlying problem from opposite directions.
> {code:java}
> select * from (select deptno as num, empno as num from emp){code}
> should work, but failed with exception
> {code:java}
> At line 1, column 8: Column 'NUM' is
> ambiguousorg.apache.calcite.runtime.CalciteContextException: At line 1,
> column 8: Column 'NUM' is ambiguous at
> java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62)
> at
> java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:502)
> at
> java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:486)
> at
> org.apache.calcite.runtime.Resources$ExInstWithCause.ex(Resources.java:511)
> at org.apache.calcite.sql.SqlUtil.newContextException(SqlUtil.java:960) at
> org.apache.calcite.sql.SqlUtil.newContextException(SqlUtil.java:945) at
> org.apache.calcite.sql.validate.SqlValidatorImpl.newValidationError(SqlValidatorImpl.java:6151)
> at
> org.apache.calcite.sql.validate.DelegatingScope.fullyQualify(DelegatingScope.java:531)
> at
> org.apache.calcite.sql.validate.SqlValidatorImpl.findTableColumnPair(SqlValidatorImpl.java:4476)
> at
> org.apache.calcite.sql.validate.SqlValidatorImpl.isRolledUpColumn(SqlValidatorImpl.java:4523)
> at
> org.apache.calcite.sql.validate.SqlValidatorImpl.expandStar(SqlValidatorImpl.java:715)
> at
> org.apache.calcite.sql.validate.SqlValidatorImpl.expandSelectItem(SqlValidatorImpl.java:481)
> {code}
> {code:java}
> select deptno from (select deptno, * from dept){code}
> should fail as ambiguous, but did not
> The first query failed because Calcite lost exact field identity after star
> expansion when a subquery exposed multiple columns with the same visible
> name. Later validation paths such as rolled-up-column checks re-resolved the
> expanded identifier by name, and that name-based lookup could bind to the
> wrong field. In practice this surfaced via isRolledUp, but the underlying
> issue was broader: after star expansion, two distinct source fields had
> collapsed to the same visible identifier and Calcite no longer knew which one
> it was looking at.
> The second query did not fail because star expansion was uniquifying output
> aliases. That meant Calcite was not preserving the real visible row type of
> the subquery. Once the duplicate column name was rewritten to a synthetic
> unique name, outer resolution no longer saw an ambiguity.
> These are two sides of the same problem: Calcite was not preserving duplicate
> visible output names through star, and it also did not retain exact field
> identity when duplicate names survived star expansion.
> h2. Solution
> This change fixes the problem at the root:
> * Star expansion now preserves visible output names exactly, including
> duplicates. For duplicate-name fields produced by star expansion, Calcite
> records exact source-field metadata during validation.
> * Later consumers use that exact metadata where name-based lookup is no
> longer sufficient
> * Ambiguous references are now reported where the preserved output row type
> is genuinely ambiguous.
> h2. Notes
> # EXCLUDE / EXCEPT duplicate-name semantics are not being defined by this
> PR and can be discussed separately in follow-up work.
> # Existing pivot.iq coverage includes cases that intentionally generate
> same-name aliases and currently fail, but after preserving duplicate output
> names the expected behavior becomes less clear and raises a broader design
> question about how pivot-generated aliases should behave. That discussion is
> better handled in a follow-up PR.
> h2. Compatibility
> This will change visible output column names compared to previous Calcite
> behavior
--
This message was sent by Atlassian Jira
(v8.20.10#820010)