[jira] [Commented] (CALCITE-7453) Preserve duplicate output names through star expansion and report resulting ambiguity correctly

Nirmal Govindaraj (Jira) Thu, 26 Mar 2026 21:42:34 -0700


    [ 
https://issues.apache.org/jira/browse/CALCITE-7453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18068876#comment-18068876
 ]


Nirmal Govindaraj commented on CALCITE-7453:
--------------------------------------------

I only tested the behaviour on Calcite and Postgres before raising this issue, 
I have now tested across PostgreSQL 16, MySQL 8.4, MariaDB 11.8, and SQLite 
3.51.0 and Calcite. Here are the queries I tested and their results
 # SELECT * FROM (SELECT deptno AS num, empno AS num FROM emp) AS t
 ## Calcite - fails
 ## MySQL - fails
 ## MariaDB - fails
 ## Postgres - Succeeds with headers num, num
 ## SQLite - Succeeds with headers num, num:1
 # SELECT deptno FROM (SELECT deptno, empno as num, empno as num FROM emp) AS t
 ## Calcite - succeeds
 ## MySQL - fails
 ## MariaDB - fails
 ## Postgres - Succeeds
 ## SQLite - Succeeds
 # SELECT deptno FROM (SELECT empno as deptno, e.* FROM emp e) AS t
 ## Calcite - succeeds with header deptno and column empno
 ## MySQL - fails
 ## MariaDB - fails
 ## Postgres - fails
 ## SQLite - succeeds with header deptno and column empno
 # SELECT deptno FROM dept d join emp e on d.deptno = e.deptno
 ## Calcite - fails
 ## MySQL - fails
 ## MariaDB - fails
 ## Postgres - fails
 ## SQLite - fails
 # SELECT deptno FROM (SELECT * FROM dept d join emp e on d.deptno = e.deptno) 
as t
 ## Calcite - Suceeds with header deptno and depts deptno column
 ## MySQL - fails
 ## MariaDB - fails
 ## Postgres - fails
 ## SQLite - Suceeds with header deptno and depts deptno column

Some conclusions we can draw from this
 # PostgreSQL, MySQL, and MariaDB are internally consistent, but they choose 
different policies.
 # PostgreSQL is permissive: duplicate output names are allowed in a subquery, 
but later references to those names fail when ambiguous.
 # MySQL and MariaDB are stricter: a subquery in FROM is treated as a derived 
table, and derived tables must expose unique column names.
 # SQLite is permissive in a different way: it accepts duplicate output names, 
but silently uniquifies them (num:1, etc.), which can make later references 
succeed where PostgreSQL would report ambiguity.
 # Calcite currently appears to mix these models. In particular, it does not 
consistently reject duplicate output names in derived tables like 
MySQL/MariaDB, but it also does not consistently preserve them and defer 
ambiguity checking like PostgreSQL. The inconsistency seems especially visible 
when duplicate names are introduced through * expansion, where Calcite behaves 
closer to SQLite by uniquifying names.

I think query (2) is a useful non-* example showing that Calcite does not 
currently enforce MySQL-style “derived table columns must be unique” semantics.

Queries (3) and (5) are useful because they show a second inconsistency: when 
duplicate names arise through * expansion, Calcite may silently disambiguate 
them and allow outer references, which is closer to SQLite than to either 
PostgreSQL or MySQL/MariaDB.

The PR I raised brings Postgres behaviour into calcite

> Preserve duplicate output names through star expansion and report resulting 
> ambiguity correctly
> -----------------------------------------------------------------------------------------------
>
>                 Key: CALCITE-7453
>                 URL: https://issues.apache.org/jira/browse/CALCITE-7453
>             Project: Calcite
>          Issue Type: Bug
>    Affects Versions: 1.42.0
>            Reporter: Nirmal Govindaraj
>            Assignee: Nirmal Govindaraj
>            Priority: Major
>              Labels: pull-request-available
>
> h2. Issue
> Two queries exposed the same underlying problem from opposite directions.
> {code:java}
> select * from (select deptno as num, empno as num from emp){code}
> should work, but failed with exception
> {code:java}
> At line 1, column 8: Column 'NUM' is 
> ambiguousorg.apache.calcite.runtime.CalciteContextException: At line 1, 
> column 8: Column 'NUM' is ambiguous      at 
> java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62)
>   at 
> java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:502)
>   at 
> java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:486)    
> at 
> org.apache.calcite.runtime.Resources$ExInstWithCause.ex(Resources.java:511)  
> at org.apache.calcite.sql.SqlUtil.newContextException(SqlUtil.java:960) at 
> org.apache.calcite.sql.SqlUtil.newContextException(SqlUtil.java:945) at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.newValidationError(SqlValidatorImpl.java:6151)
>       at 
> org.apache.calcite.sql.validate.DelegatingScope.fullyQualify(DelegatingScope.java:531)
>        at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.findTableColumnPair(SqlValidatorImpl.java:4476)
>      at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.isRolledUpColumn(SqlValidatorImpl.java:4523)
>         at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.expandStar(SqlValidatorImpl.java:715)
>        at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.expandSelectItem(SqlValidatorImpl.java:481)
>   {code}
> {code:java}
> select deptno from (select deptno, * from dept){code}
> should fail as ambiguous, but did not
> The first query failed because Calcite lost exact field identity after star 
> expansion when a subquery exposed multiple columns with the same visible 
> name. Later validation paths such as rolled-up-column checks re-resolved the 
> expanded identifier by name, and that name-based lookup could bind to the 
> wrong field. In practice this surfaced via isRolledUp, but the underlying 
> issue was broader: after star expansion, two distinct source fields had 
> collapsed to the same visible identifier and Calcite no longer knew which one 
> it was looking at.
> The second query did not fail because star expansion was uniquifying output 
> aliases. That meant Calcite was not preserving the real visible row type of 
> the subquery. Once the duplicate column name was rewritten to a synthetic 
> unique name, outer resolution no longer saw an ambiguity.
> These are two sides of the same problem: Calcite was not preserving duplicate 
> visible output names through star, and it also did not retain exact field 
> identity when duplicate names survived star expansion.
> h2. Solution
> This change fixes the problem at the root:
>  * Star expansion now preserves visible output names exactly, including 
> duplicates. For duplicate-name fields produced by star expansion, Calcite 
> records exact source-field metadata during validation.
>  * Later consumers use that exact metadata where name-based lookup is no 
> longer sufficient
>  * Ambiguous references are now reported where the preserved output row type 
> is genuinely ambiguous.
> h2. Notes
>  #  EXCLUDE / EXCEPT duplicate-name semantics are not being defined by this 
> PR and can be discussed separately in follow-up work.
>  # Existing pivot.iq coverage includes cases that intentionally generate 
> same-name aliases and currently fail, but after preserving duplicate output 
> names the expected behavior becomes less clear and raises a broader design 
> question about how pivot-generated aliases should behave. That discussion is 
> better handled in a follow-up PR.
> h2. Compatibility
> This will change visible output column names compared to previous Calcite 
> behavior



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-7453) Preserve duplicate output names through star expansion and report resulting ambiguity correctly

Reply via email to