Nirmal Govindaraj created CALCITE-7453:
------------------------------------------

             Summary: Preserve duplicate output names through star expansion 
and report resulting ambiguity correctly
                 Key: CALCITE-7453
                 URL: https://issues.apache.org/jira/browse/CALCITE-7453
             Project: Calcite
          Issue Type: Bug
    Affects Versions: 1.42.0
            Reporter: Nirmal Govindaraj
            Assignee: Nirmal Govindaraj


h2. Issue

Two queries exposed the same underlying problem from opposite directions.

 
{code:java}
select * from (select deptno as num, empno as num from emp){code}
should work, but failed with exception

 
{code:java}
At line 1, column 8: Column 'NUM' is 
ambiguousorg.apache.calcite.runtime.CalciteContextException: At line 1, column 
8: Column 'NUM' is ambiguous        at 
java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62)
  at 
java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:502)
  at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:486)  
  at 
org.apache.calcite.runtime.Resources$ExInstWithCause.ex(Resources.java:511)  at 
org.apache.calcite.sql.SqlUtil.newContextException(SqlUtil.java:960) at 
org.apache.calcite.sql.SqlUtil.newContextException(SqlUtil.java:945) at 
org.apache.calcite.sql.validate.SqlValidatorImpl.newValidationError(SqlValidatorImpl.java:6151)
      at 
org.apache.calcite.sql.validate.DelegatingScope.fullyQualify(DelegatingScope.java:531)
       at 
org.apache.calcite.sql.validate.SqlValidatorImpl.findTableColumnPair(SqlValidatorImpl.java:4476)
     at 
org.apache.calcite.sql.validate.SqlValidatorImpl.isRolledUpColumn(SqlValidatorImpl.java:4523)
        at 
org.apache.calcite.sql.validate.SqlValidatorImpl.expandStar(SqlValidatorImpl.java:715)
       at 
org.apache.calcite.sql.validate.SqlValidatorImpl.expandSelectItem(SqlValidatorImpl.java:481){code}
 

 

  
{code:java}
select deptno from (select deptno, * from dept) -- succeeds
select deptno from (select deptno, deptno from dept) -- fails correctly{code}
should fail as ambiguous, but did not

The first query failed because Calcite lost exact field identity after star 
expansion when a subquery exposed multiple columns with the same visible name. 
Later validation paths such as rolled-up-column checks re-resolved the expanded 
identifier by name, and that name-based lookup could bind to the wrong field. 
In practice this surfaced via isRolledUp, but the underlying issue was broader: 
after star expansion, two distinct source fields had collapsed to the same 
visible identifier and Calcite no longer knew which one it was looking at.

 

The second query did not fail because star expansion was uniquifying output 
aliases. That meant Calcite was not preserving the real visible row type of the 
subquery. Once the duplicate column name was rewritten to a synthetic unique 
name, outer resolution no longer saw an ambiguity.

These are two sides of the same problem: Calcite was not preserving duplicate 
visible output names through star, and it also did not retain exact field 
identity when duplicate names survived star expansion.
h2. Solution

This change fixes the problem at the root:
 * Star expansion now preserves visible output names exactly, including 
duplicates. For duplicate-name fields produced by star expansion, Calcite 
records exact source-field metadata during validation.
 * Later consumers use that exact metadata where name-based lookup is no longer 
sufficient
 * Ambiguous references are now reported where the preserved output row type is 
genuinely ambiguous.

h2. Notes

  1. EXCLUDE / EXCEPT duplicate-name semantics are not being defined by this PR 
and can be discussed separately in follow-up work.
  2. Existing pivot.iq coverage includes cases that intentionally generate 
same-name aliases and currently fail, but after preserving duplicate output 
names the expected behavior becomes less clear and raises a broader design 
question about how pivot-generated aliases should behave. That discussion is 
better handled in a follow-up PR.
h2.  Compatibility

  This will change visible output column names compared to previous Calcite 
behavior



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to