Hi Bryan,

Thanks for your contribution, I'll review it when I'm free.

Best regards,
------------------------
Yuan Tian

On Wed, Jun 17, 2026 at 2:53 PM Bryan Yang <[email protected]> wrote:

> Hi Yuan Tian,
>
> I have completed the implementation for Issue #17797 Part 2: supporting
> lateral column aliases in the table-model SELECT list.
>
> The PR is available here:
>
> https://github.com/apache/iotdb/pull/17960
>
> This PR adds left-to-right SELECT-list alias resolution, keeps local input
> columns at higher priority than SELECT aliases, avoids rewriting qualified
> expressions and subqueries, preserves WHERE/HAVING semantics, and keeps the
> existing ORDER BY alias behavior. It also includes analyzer tests and a
> plan-level test to ensure reused aliases do not cause duplicated projection
> computation.
>
> I have verified the changes with:
> ./mvnw spotless:apply -pl iotdb-core/datanode ./mvnw -nsu test -pl
> iotdb-core/datanode -Dtest=SelectAliasReuseTest
>
> The PR is ready for review. It will close #17797 after being merged.
>
> Best regards,
> Bryan Yang (杨易达)
>
> Yuan Tian <[email protected]> 于2026年6月16日周二 14:22写道:
>
> > Hi Bryan,
> >
> >
> > Thanks for the detailed proposal. Overall, the direction looks
> > reasonable to me. Reusing the existing SELECT syntax and resolving LCA
> > during analysis also seems consistent with the current alias-reuse
> > work for GROUP BY / ORDER BY.
> >
> > A few comments after looking at the current analyzer and planner code:
> >
> > 1. The proposed resolution priority looks good to me:
> >
> > local source column > visible previous aliases > existing analyzer
> > resolution
> >
> > This matches the current GROUP BY alias behavior, where local input
> > columns take precedence over SELECT aliases, while outer-scope columns
> > should not block aliases defined in the current SELECT list.
> >
> > 2. I agree that duplicate aliases should be treated as ambiguous
> > instead of overwriting each other. This is also consistent with the
> > current SELECT alias reuse behavior.
> >
> > 3. Please be careful with performance. For queries that do not use
> > LCA, we should avoid adding noticeable overhead. For example, if there
> > are no visible previous aliases, the rewrite step should be skipped
> > directly. Also, alias lookup should probably use a canonical-name
> > map/multimap instead of scanning a list for every Identifier.
> >
> > 4. For queries that do use LCA, deep-copying the defining expression
> > is necessary because Analysis uses NodeRef identity-based maps.
> > However, pure expression inlining may duplicate scalar computations in
> > the generated plan. For example:
> >
> > SELECT expensive(s1) AS x, x + 1 AS y
> > FROM table1;
> >
> > If this is rewritten as:
> >
> > SELECT expensive(s1) AS x, expensive(s1) + 1 AS y
> > FROM table1;
> >
> > then the scalar expression may be evaluated twice unless the planner
> > later deduplicates it. This is especially worth checking for chained
> > aliases, where repeated deep copies could cause the expression tree to
> > grow quickly. I think we should add plan-level tests for this, not
> > only analyzer tests.
> >
> > 5. I agree that the rewriter must not enter SubqueryExpression, and
> > DereferenceExpression needs special handling. In particular,
> > expressions like t.x, table1.x, and x.y should not rewrite the base
> > identifier as an alias reference.
> >
> > 6. For window function aliases, I agree with your note: either copied
> > window functions must have their resolved window metadata registered
> > again, or the first implementation should reject this case explicitly
> > with a clear error message.
> >
> > One small wording point: I think “Forward references are not
> > supported” is the correct English term here, because the expression is
> > referring to an alias defined later in the SELECT list. To avoid
> > ambiguity, maybe we can phrase it as:
> >
> > “Forward references, i.e., references to aliases defined later in the
> > SELECT list, are not supported.”
> >
> > Best regards,
> > ---------------
> > Yuan Tian
> >
> >
> > On Wed, Jun 10, 2026 at 4:50 PM Bryan Yang <[email protected]> wrote:
> >
> > > *Hi IoTDB community,*
> > >
> > > I would like to propose the implementation plan for Part 2 of
> > > apache/iotdb#17797: supporting Lateral Column Alias (LCA) in the table
> > > model SELECT list. issue: https://github.com/apache/iotdb/issues/17797
> > >
> > > LCA allows a later SELECT item to reference an explicit alias defined
> by
> > an
> > > earlier SELECT item.
> > >
> > > SELECT s1 AS x, x + 1 AS y
> > > FROM table1;
> > >
> > > This should be analyzed as:
> > >
> > > SELECT s1 AS x, s1 + 1 AS y
> > > FROM table1;
> > >
> > > The implementation does not require new keywords or grammar changes.
> The
> > > existing SELECT syntax can be reused, and LCA can be resolved during
> > > analysis by rewriting expressions before type analysis, aggregation
> > > analysis, output scope computation, and planning.
> > > Proposed semantics
> > >
> > > LCA is resolved from left to right inside the SELECT list.
> > >
> > > SELECT s1 AS x, x + 1 AS y, y * 2 AS z
> > > FROM table1;
> > >
> > > is equivalent to:
> > >
> > > SELECT s1 AS x, s1 + 1 AS y, (s1 + 1) * 2 AS z
> > > FROM table1;
> > >
> > > Forward references are not supported:
> > >
> > > SELECT y + 1 AS x, s1 AS y
> > > FROM table1;
> > >
> > > The alias of the current SELECT item is not visible to its own
> > expression:
> > >
> > > SELECT x + 1 AS x
> > > FROM table1;
> > >
> > > Only unqualified identifiers are considered alias references. Qualified
> > > expressions such as t.x, table1.x, or x.y should continue to use the
> > > existing qualified column resolution rules.
> > >
> > > The recommended resolution priority is:
> > >
> > > local source column > visible previous aliases > existing analyzer
> > > resolution
> > >
> > > This means local input columns should take precedence over previous
> > > aliases, while outer query columns should not block visible aliases
> from
> > > the current SELECT list.
> > >
> > > Duplicate aliases should not overwrite each other. If multiple previous
> > > aliases have the same canonical name and there is no same-name local
> > source
> > > column, the analyzer should report:
> > >
> > > Column alias 'x' is ambiguous
> > >
> > > Non-goals
> > >
> > > This change should not alter WHERE or HAVING semantics.
> > >
> > > SELECT s1 AS x FROM table1 WHERE x > 1;
> > > SELECT avg(s1) AS a FROM table1 HAVING a > 1;
> > >
> > > These should still resolve x or a only through the input scope, not
> > through
> > > SELECT aliases.
> > >
> > > LCA should also not enter subquery scopes:
> > >
> > > SELECT s1 AS x, (SELECT x FROM table1) AS y
> > > FROM table1;
> > >
> > > The x inside the subquery should not be rewritten using the outer
> SELECT
> > > alias.
> > > Analyzer changes
> > >
> > > The entry point should be StatementAnalyzer.analyzeSelect.
> > >
> > > The SELECT list can be processed left to right. For each normal
> > > SingleColumn
> > > :
> > >
> > > 1. Rewrite the original expression using visible previous aliases.
> > > 2. Register window metadata for any newly copied window functions.
> > > 3. Analyze the rewritten expression.
> > > 4. Record the rewritten expression in SelectAnalysis output
> expressions.
> > > 5. Record SingleColumn -> rewritten expression mapping.
> > > 6. Add the current explicit alias to visible aliases only after its
> > > expression is rewritten and analyzed.
> > >
> > > Pseudo-code:
> > >
> > > List<SelectAlias> visibleAliases = new ArrayList<>();
> > >
> > > for (SelectItem item : node.getSelect().getSelectItems()) {
> > >   if (item instanceof SingleColumn) {
> > >     SingleColumn singleColumn = (SingleColumn) item;
> > >
> > >     Expression originalExpression = singleColumn.getExpression();
> > >     Expression rewrittenExpression =
> > >         rewriteLateralColumnAlias(originalExpression, scope,
> > > visibleAliases);
> > >
> > >     resolveWindowFunctionsInExpression(node, rewrittenExpression);
> > >
> > >     analyzeSelectSingleColumn(
> > >         rewrittenExpression,
> > >         node,
> > >         scope,
> > >         outputExpressionBuilder,
> > >         selectExpressionBuilder);
> > >
> > >     singleColumnOutputExpressions.put(
> > >         NodeRef.of(singleColumn),
> > >         ImmutableList.of(rewrittenExpression));
> > >
> > >     if (singleColumn.getAlias().isPresent() &&
> > > !containsColumnsFunction(singleColumn)) {
> > >       Identifier alias = singleColumn.getAlias().get();
> > >       visibleAliases.add(
> > >           new SelectAlias(
> > >               alias.getCanonicalValue(),
> > >               outputPosition,
> > >               rewrittenExpression));
> > >     }
> > >
> > >     outputPosition++;
> > >   }
> > > }
> > >
> > > SelectAlias can be extended to keep:
> > >
> > > canonicalName
> > > position
> > > rewrittenExpression
> > >
> > > SelectAnalysis should keep semantic output expressions per
> SingleColumn,
> > > for example:
> > >
> > > NodeRef<SingleColumn> -> List<Expression>
> > >
> > > The original AST should remain unchanged because
> SingleColumn.expression
> > is
> > > final.
> > > Expression rewriting
> > >
> > > LCA rewriting should replace only unqualified Identifiers.
> > >
> > > When an identifier is encountered:
> > >
> > > 1. If it resolves to a local input column, keep it unchanged.
> > > 2. Otherwise, look up visible previous aliases.
> > > 3. If exactly one alias matches, replace the identifier with a deep
> > > copy of that alias's rewritten expression.
> > > 4. If multiple aliases match, report alias ambiguity.
> > > 5. If no alias matches, keep the identifier unchanged and let existing
> > > analysis handle it.
> > >
> > > The rewriter should not traverse into SubqueryExpression.
> > >
> > > It should also handle DereferenceExpression carefully so that
> expressions
> > > such as t.x, table1.x, and x.y are not rewritten as alias references.
> > >
> > > Each alias replacement must create a deep copy of the defining
> > expression.
> > > Reusing the same expression instance is unsafe because IoTDB uses
> NodeRef
> > > as identity-based keys in analysis maps.
> > > AllColumns and COLUMNS(...)
> > >
> > > AllColumns should not register aliases for LCA.
> > >
> > > A SingleColumn containing COLUMNS(...) should be expanded and analyzed
> as
> > > today, but its alias should not be registered as a reusable LCA alias
> > > because it may expand to multiple output columns.
> > >
> > > SELECT COLUMNS('s.*') AS x, x + 1 AS y
> > > FROM table1;
> > >
> > > Here, x should not be treated as a unique alias definition from
> > > COLUMNS(...)
> > > .
> > > Aggregation and window functions
> > >
> > > LCA may reference previous aggregate expressions:
> > >
> > > SELECT avg(s1) AS a, a + 1 AS b
> > > FROM table1;
> > >
> > > This should be rewritten to:
> > >
> > > SELECT avg(s1) AS a, avg(s1) + 1 AS b
> > > FROM table1;
> > >
> > > Existing aggregation validity checks should still apply after
> rewriting.
> > >
> > > For window functions, supporting alias references is preferred:
> > >
> > > SELECT row_number() OVER (ORDER BY s1) AS rn, rn + 1 AS rn2
> > > FROM table1;
> > >
> > > Since LCA deep copy creates new FunctionCall nodes, any copied window
> > > functions need their resolved window metadata registered again in
> > Analysis.
> > > If this is not supported in the first implementation, we should reject
> > such
> > > aliases explicitly with a clear error instead of failing silently.
> > > GROUP BY and ORDER BY compatibility
> > >
> > > This should remain compatible with Part 1 alias reuse behavior.
> > >
> > > SELECT date_bin(1h, time) AS hour_time, avg(s1)
> > > FROM table1
> > > GROUP BY hour_time
> > > ORDER BY hour_time;
> > >
> > > GROUP BY <alias> should continue to resolve to the corresponding select
> > > expression, while ORDER BY <alias> can continue to resolve to the
> output
> > > field reference.
> > >
> > > Because SELECT output expressions are already rewritten, Part 1 and
> Part
> > 2
> > > should compose naturally:
> > >
> > > SELECT s1 AS x, x + 1 AS y, count(*)
> > > FROM table1
> > > GROUP BY y;
> > >
> > > GROUP BY y should resolve to:
> > >
> > > s1 + 1
> > >
> > > Suggested tests
> > >
> > > I plan to add or update tests in SelectAliasReuseTest for:
> > >
> > > SELECT s1 AS x, x + 1 AS y FROM table1;
> > > SELECT s1 AS x, x + 1 AS y, y * 2 AS z FROM table1;
> > > SELECT s1 AS x, x + x AS y FROM table1;
> > > SELECT y + 1 AS x, s1 AS y FROM table1;
> > > SELECT x + 1 AS x FROM table1;
> > > SELECT s1 AS x, x + 1 AS y FROM table_with_x;
> > > SELECT s1 AS x, s2 AS x, x + 1 AS y FROM table1;
> > > SELECT s1 AS x, s2 AS x, x + 1 AS y FROM table_with_x;
> > > SELECT s1 AS x, table1.x + 1 AS y FROM table1;
> > > SELECT s1 AS x, (SELECT x FROM table1) AS y FROM table1;
> > > SELECT avg(s1) AS a, a + 1 AS b FROM table1;
> > > SELECT s1 AS x, avg(s2) + x AS y FROM table1;
> > > SELECT avg(s1) AS a FROM table1 HAVING a > 1;
> > > SELECT s1 AS x FROM table1 WHERE x > 1;
> > > SELECT COLUMNS('s.*') AS x, x + 1 AS y FROM table1;
> > > SELECT row_number() OVER (ORDER BY s1) AS rn, rn + 1 AS rn2 FROM
> table1;
> > > SELECT s1 AS x, x FROM table1;
> > >
> > > Please let me know whether this direction looks reasonable, especially
> > the
> > > resolution priority, duplicate alias handling, and the preferred
> behavior
> > > for window function aliases.
> > >
> > > *Best regards, Bryan Yang(杨易达)*
> > >
> >
>

Reply via email to