Hi Yuan Tian,

I have completed the implementation for Issue #17797 Part 2: supporting
lateral column aliases in the table-model SELECT list.

The PR is available here:

https://github.com/apache/iotdb/pull/17960

This PR adds left-to-right SELECT-list alias resolution, keeps local input
columns at higher priority than SELECT aliases, avoids rewriting qualified
expressions and subqueries, preserves WHERE/HAVING semantics, and keeps the
existing ORDER BY alias behavior. It also includes analyzer tests and a
plan-level test to ensure reused aliases do not cause duplicated projection
computation.

I have verified the changes with:
./mvnw spotless:apply -pl iotdb-core/datanode ./mvnw -nsu test -pl
iotdb-core/datanode -Dtest=SelectAliasReuseTest

The PR is ready for review. It will close #17797 after being merged.

Best regards,
Bryan Yang (杨易达)

Yuan Tian <[email protected]> 于2026年6月16日周二 14:22写道:

> Hi Bryan,
>
>
> Thanks for the detailed proposal. Overall, the direction looks
> reasonable to me. Reusing the existing SELECT syntax and resolving LCA
> during analysis also seems consistent with the current alias-reuse
> work for GROUP BY / ORDER BY.
>
> A few comments after looking at the current analyzer and planner code:
>
> 1. The proposed resolution priority looks good to me:
>
> local source column > visible previous aliases > existing analyzer
> resolution
>
> This matches the current GROUP BY alias behavior, where local input
> columns take precedence over SELECT aliases, while outer-scope columns
> should not block aliases defined in the current SELECT list.
>
> 2. I agree that duplicate aliases should be treated as ambiguous
> instead of overwriting each other. This is also consistent with the
> current SELECT alias reuse behavior.
>
> 3. Please be careful with performance. For queries that do not use
> LCA, we should avoid adding noticeable overhead. For example, if there
> are no visible previous aliases, the rewrite step should be skipped
> directly. Also, alias lookup should probably use a canonical-name
> map/multimap instead of scanning a list for every Identifier.
>
> 4. For queries that do use LCA, deep-copying the defining expression
> is necessary because Analysis uses NodeRef identity-based maps.
> However, pure expression inlining may duplicate scalar computations in
> the generated plan. For example:
>
> SELECT expensive(s1) AS x, x + 1 AS y
> FROM table1;
>
> If this is rewritten as:
>
> SELECT expensive(s1) AS x, expensive(s1) + 1 AS y
> FROM table1;
>
> then the scalar expression may be evaluated twice unless the planner
> later deduplicates it. This is especially worth checking for chained
> aliases, where repeated deep copies could cause the expression tree to
> grow quickly. I think we should add plan-level tests for this, not
> only analyzer tests.
>
> 5. I agree that the rewriter must not enter SubqueryExpression, and
> DereferenceExpression needs special handling. In particular,
> expressions like t.x, table1.x, and x.y should not rewrite the base
> identifier as an alias reference.
>
> 6. For window function aliases, I agree with your note: either copied
> window functions must have their resolved window metadata registered
> again, or the first implementation should reject this case explicitly
> with a clear error message.
>
> One small wording point: I think “Forward references are not
> supported” is the correct English term here, because the expression is
> referring to an alias defined later in the SELECT list. To avoid
> ambiguity, maybe we can phrase it as:
>
> “Forward references, i.e., references to aliases defined later in the
> SELECT list, are not supported.”
>
> Best regards,
> ---------------
> Yuan Tian
>
>
> On Wed, Jun 10, 2026 at 4:50 PM Bryan Yang <[email protected]> wrote:
>
> > *Hi IoTDB community,*
> >
> > I would like to propose the implementation plan for Part 2 of
> > apache/iotdb#17797: supporting Lateral Column Alias (LCA) in the table
> > model SELECT list. issue: https://github.com/apache/iotdb/issues/17797
> >
> > LCA allows a later SELECT item to reference an explicit alias defined by
> an
> > earlier SELECT item.
> >
> > SELECT s1 AS x, x + 1 AS y
> > FROM table1;
> >
> > This should be analyzed as:
> >
> > SELECT s1 AS x, s1 + 1 AS y
> > FROM table1;
> >
> > The implementation does not require new keywords or grammar changes. The
> > existing SELECT syntax can be reused, and LCA can be resolved during
> > analysis by rewriting expressions before type analysis, aggregation
> > analysis, output scope computation, and planning.
> > Proposed semantics
> >
> > LCA is resolved from left to right inside the SELECT list.
> >
> > SELECT s1 AS x, x + 1 AS y, y * 2 AS z
> > FROM table1;
> >
> > is equivalent to:
> >
> > SELECT s1 AS x, s1 + 1 AS y, (s1 + 1) * 2 AS z
> > FROM table1;
> >
> > Forward references are not supported:
> >
> > SELECT y + 1 AS x, s1 AS y
> > FROM table1;
> >
> > The alias of the current SELECT item is not visible to its own
> expression:
> >
> > SELECT x + 1 AS x
> > FROM table1;
> >
> > Only unqualified identifiers are considered alias references. Qualified
> > expressions such as t.x, table1.x, or x.y should continue to use the
> > existing qualified column resolution rules.
> >
> > The recommended resolution priority is:
> >
> > local source column > visible previous aliases > existing analyzer
> > resolution
> >
> > This means local input columns should take precedence over previous
> > aliases, while outer query columns should not block visible aliases from
> > the current SELECT list.
> >
> > Duplicate aliases should not overwrite each other. If multiple previous
> > aliases have the same canonical name and there is no same-name local
> source
> > column, the analyzer should report:
> >
> > Column alias 'x' is ambiguous
> >
> > Non-goals
> >
> > This change should not alter WHERE or HAVING semantics.
> >
> > SELECT s1 AS x FROM table1 WHERE x > 1;
> > SELECT avg(s1) AS a FROM table1 HAVING a > 1;
> >
> > These should still resolve x or a only through the input scope, not
> through
> > SELECT aliases.
> >
> > LCA should also not enter subquery scopes:
> >
> > SELECT s1 AS x, (SELECT x FROM table1) AS y
> > FROM table1;
> >
> > The x inside the subquery should not be rewritten using the outer SELECT
> > alias.
> > Analyzer changes
> >
> > The entry point should be StatementAnalyzer.analyzeSelect.
> >
> > The SELECT list can be processed left to right. For each normal
> > SingleColumn
> > :
> >
> > 1. Rewrite the original expression using visible previous aliases.
> > 2. Register window metadata for any newly copied window functions.
> > 3. Analyze the rewritten expression.
> > 4. Record the rewritten expression in SelectAnalysis output expressions.
> > 5. Record SingleColumn -> rewritten expression mapping.
> > 6. Add the current explicit alias to visible aliases only after its
> > expression is rewritten and analyzed.
> >
> > Pseudo-code:
> >
> > List<SelectAlias> visibleAliases = new ArrayList<>();
> >
> > for (SelectItem item : node.getSelect().getSelectItems()) {
> >   if (item instanceof SingleColumn) {
> >     SingleColumn singleColumn = (SingleColumn) item;
> >
> >     Expression originalExpression = singleColumn.getExpression();
> >     Expression rewrittenExpression =
> >         rewriteLateralColumnAlias(originalExpression, scope,
> > visibleAliases);
> >
> >     resolveWindowFunctionsInExpression(node, rewrittenExpression);
> >
> >     analyzeSelectSingleColumn(
> >         rewrittenExpression,
> >         node,
> >         scope,
> >         outputExpressionBuilder,
> >         selectExpressionBuilder);
> >
> >     singleColumnOutputExpressions.put(
> >         NodeRef.of(singleColumn),
> >         ImmutableList.of(rewrittenExpression));
> >
> >     if (singleColumn.getAlias().isPresent() &&
> > !containsColumnsFunction(singleColumn)) {
> >       Identifier alias = singleColumn.getAlias().get();
> >       visibleAliases.add(
> >           new SelectAlias(
> >               alias.getCanonicalValue(),
> >               outputPosition,
> >               rewrittenExpression));
> >     }
> >
> >     outputPosition++;
> >   }
> > }
> >
> > SelectAlias can be extended to keep:
> >
> > canonicalName
> > position
> > rewrittenExpression
> >
> > SelectAnalysis should keep semantic output expressions per SingleColumn,
> > for example:
> >
> > NodeRef<SingleColumn> -> List<Expression>
> >
> > The original AST should remain unchanged because SingleColumn.expression
> is
> > final.
> > Expression rewriting
> >
> > LCA rewriting should replace only unqualified Identifiers.
> >
> > When an identifier is encountered:
> >
> > 1. If it resolves to a local input column, keep it unchanged.
> > 2. Otherwise, look up visible previous aliases.
> > 3. If exactly one alias matches, replace the identifier with a deep
> > copy of that alias's rewritten expression.
> > 4. If multiple aliases match, report alias ambiguity.
> > 5. If no alias matches, keep the identifier unchanged and let existing
> > analysis handle it.
> >
> > The rewriter should not traverse into SubqueryExpression.
> >
> > It should also handle DereferenceExpression carefully so that expressions
> > such as t.x, table1.x, and x.y are not rewritten as alias references.
> >
> > Each alias replacement must create a deep copy of the defining
> expression.
> > Reusing the same expression instance is unsafe because IoTDB uses NodeRef
> > as identity-based keys in analysis maps.
> > AllColumns and COLUMNS(...)
> >
> > AllColumns should not register aliases for LCA.
> >
> > A SingleColumn containing COLUMNS(...) should be expanded and analyzed as
> > today, but its alias should not be registered as a reusable LCA alias
> > because it may expand to multiple output columns.
> >
> > SELECT COLUMNS('s.*') AS x, x + 1 AS y
> > FROM table1;
> >
> > Here, x should not be treated as a unique alias definition from
> > COLUMNS(...)
> > .
> > Aggregation and window functions
> >
> > LCA may reference previous aggregate expressions:
> >
> > SELECT avg(s1) AS a, a + 1 AS b
> > FROM table1;
> >
> > This should be rewritten to:
> >
> > SELECT avg(s1) AS a, avg(s1) + 1 AS b
> > FROM table1;
> >
> > Existing aggregation validity checks should still apply after rewriting.
> >
> > For window functions, supporting alias references is preferred:
> >
> > SELECT row_number() OVER (ORDER BY s1) AS rn, rn + 1 AS rn2
> > FROM table1;
> >
> > Since LCA deep copy creates new FunctionCall nodes, any copied window
> > functions need their resolved window metadata registered again in
> Analysis.
> > If this is not supported in the first implementation, we should reject
> such
> > aliases explicitly with a clear error instead of failing silently.
> > GROUP BY and ORDER BY compatibility
> >
> > This should remain compatible with Part 1 alias reuse behavior.
> >
> > SELECT date_bin(1h, time) AS hour_time, avg(s1)
> > FROM table1
> > GROUP BY hour_time
> > ORDER BY hour_time;
> >
> > GROUP BY <alias> should continue to resolve to the corresponding select
> > expression, while ORDER BY <alias> can continue to resolve to the output
> > field reference.
> >
> > Because SELECT output expressions are already rewritten, Part 1 and Part
> 2
> > should compose naturally:
> >
> > SELECT s1 AS x, x + 1 AS y, count(*)
> > FROM table1
> > GROUP BY y;
> >
> > GROUP BY y should resolve to:
> >
> > s1 + 1
> >
> > Suggested tests
> >
> > I plan to add or update tests in SelectAliasReuseTest for:
> >
> > SELECT s1 AS x, x + 1 AS y FROM table1;
> > SELECT s1 AS x, x + 1 AS y, y * 2 AS z FROM table1;
> > SELECT s1 AS x, x + x AS y FROM table1;
> > SELECT y + 1 AS x, s1 AS y FROM table1;
> > SELECT x + 1 AS x FROM table1;
> > SELECT s1 AS x, x + 1 AS y FROM table_with_x;
> > SELECT s1 AS x, s2 AS x, x + 1 AS y FROM table1;
> > SELECT s1 AS x, s2 AS x, x + 1 AS y FROM table_with_x;
> > SELECT s1 AS x, table1.x + 1 AS y FROM table1;
> > SELECT s1 AS x, (SELECT x FROM table1) AS y FROM table1;
> > SELECT avg(s1) AS a, a + 1 AS b FROM table1;
> > SELECT s1 AS x, avg(s2) + x AS y FROM table1;
> > SELECT avg(s1) AS a FROM table1 HAVING a > 1;
> > SELECT s1 AS x FROM table1 WHERE x > 1;
> > SELECT COLUMNS('s.*') AS x, x + 1 AS y FROM table1;
> > SELECT row_number() OVER (ORDER BY s1) AS rn, rn + 1 AS rn2 FROM table1;
> > SELECT s1 AS x, x FROM table1;
> >
> > Please let me know whether this direction looks reasonable, especially
> the
> > resolution priority, duplicate alias handling, and the preferred behavior
> > for window function aliases.
> >
> > *Best regards, Bryan Yang(杨易达)*
> >
>

Reply via email to