Hi Bryan,

Thanks for the detailed proposal. Overall, the direction looks
reasonable to me. Reusing the existing SELECT syntax and resolving LCA
during analysis also seems consistent with the current alias-reuse
work for GROUP BY / ORDER BY.

A few comments after looking at the current analyzer and planner code:

1. The proposed resolution priority looks good to me:

local source column > visible previous aliases > existing analyzer resolution

This matches the current GROUP BY alias behavior, where local input
columns take precedence over SELECT aliases, while outer-scope columns
should not block aliases defined in the current SELECT list.

2. I agree that duplicate aliases should be treated as ambiguous
instead of overwriting each other. This is also consistent with the
current SELECT alias reuse behavior.

3. Please be careful with performance. For queries that do not use
LCA, we should avoid adding noticeable overhead. For example, if there
are no visible previous aliases, the rewrite step should be skipped
directly. Also, alias lookup should probably use a canonical-name
map/multimap instead of scanning a list for every Identifier.

4. For queries that do use LCA, deep-copying the defining expression
is necessary because Analysis uses NodeRef identity-based maps.
However, pure expression inlining may duplicate scalar computations in
the generated plan. For example:

SELECT expensive(s1) AS x, x + 1 AS y
FROM table1;

If this is rewritten as:

SELECT expensive(s1) AS x, expensive(s1) + 1 AS y
FROM table1;

then the scalar expression may be evaluated twice unless the planner
later deduplicates it. This is especially worth checking for chained
aliases, where repeated deep copies could cause the expression tree to
grow quickly. I think we should add plan-level tests for this, not
only analyzer tests.

5. I agree that the rewriter must not enter SubqueryExpression, and
DereferenceExpression needs special handling. In particular,
expressions like t.x, table1.x, and x.y should not rewrite the base
identifier as an alias reference.

6. For window function aliases, I agree with your note: either copied
window functions must have their resolved window metadata registered
again, or the first implementation should reject this case explicitly
with a clear error message.

One small wording point: I think “Forward references are not
supported” is the correct English term here, because the expression is
referring to an alias defined later in the SELECT list. To avoid
ambiguity, maybe we can phrase it as:

“Forward references, i.e., references to aliases defined later in the
SELECT list, are not supported.”

Best regards,
---------------
Yuan Tian


On Wed, Jun 10, 2026 at 4:50 PM Bryan Yang <[email protected]> wrote:

> *Hi IoTDB community,*
>
> I would like to propose the implementation plan for Part 2 of
> apache/iotdb#17797: supporting Lateral Column Alias (LCA) in the table
> model SELECT list. issue: https://github.com/apache/iotdb/issues/17797
>
> LCA allows a later SELECT item to reference an explicit alias defined by an
> earlier SELECT item.
>
> SELECT s1 AS x, x + 1 AS y
> FROM table1;
>
> This should be analyzed as:
>
> SELECT s1 AS x, s1 + 1 AS y
> FROM table1;
>
> The implementation does not require new keywords or grammar changes. The
> existing SELECT syntax can be reused, and LCA can be resolved during
> analysis by rewriting expressions before type analysis, aggregation
> analysis, output scope computation, and planning.
> Proposed semantics
>
> LCA is resolved from left to right inside the SELECT list.
>
> SELECT s1 AS x, x + 1 AS y, y * 2 AS z
> FROM table1;
>
> is equivalent to:
>
> SELECT s1 AS x, s1 + 1 AS y, (s1 + 1) * 2 AS z
> FROM table1;
>
> Forward references are not supported:
>
> SELECT y + 1 AS x, s1 AS y
> FROM table1;
>
> The alias of the current SELECT item is not visible to its own expression:
>
> SELECT x + 1 AS x
> FROM table1;
>
> Only unqualified identifiers are considered alias references. Qualified
> expressions such as t.x, table1.x, or x.y should continue to use the
> existing qualified column resolution rules.
>
> The recommended resolution priority is:
>
> local source column > visible previous aliases > existing analyzer
> resolution
>
> This means local input columns should take precedence over previous
> aliases, while outer query columns should not block visible aliases from
> the current SELECT list.
>
> Duplicate aliases should not overwrite each other. If multiple previous
> aliases have the same canonical name and there is no same-name local source
> column, the analyzer should report:
>
> Column alias 'x' is ambiguous
>
> Non-goals
>
> This change should not alter WHERE or HAVING semantics.
>
> SELECT s1 AS x FROM table1 WHERE x > 1;
> SELECT avg(s1) AS a FROM table1 HAVING a > 1;
>
> These should still resolve x or a only through the input scope, not through
> SELECT aliases.
>
> LCA should also not enter subquery scopes:
>
> SELECT s1 AS x, (SELECT x FROM table1) AS y
> FROM table1;
>
> The x inside the subquery should not be rewritten using the outer SELECT
> alias.
> Analyzer changes
>
> The entry point should be StatementAnalyzer.analyzeSelect.
>
> The SELECT list can be processed left to right. For each normal
> SingleColumn
> :
>
> 1. Rewrite the original expression using visible previous aliases.
> 2. Register window metadata for any newly copied window functions.
> 3. Analyze the rewritten expression.
> 4. Record the rewritten expression in SelectAnalysis output expressions.
> 5. Record SingleColumn -> rewritten expression mapping.
> 6. Add the current explicit alias to visible aliases only after its
> expression is rewritten and analyzed.
>
> Pseudo-code:
>
> List<SelectAlias> visibleAliases = new ArrayList<>();
>
> for (SelectItem item : node.getSelect().getSelectItems()) {
>   if (item instanceof SingleColumn) {
>     SingleColumn singleColumn = (SingleColumn) item;
>
>     Expression originalExpression = singleColumn.getExpression();
>     Expression rewrittenExpression =
>         rewriteLateralColumnAlias(originalExpression, scope,
> visibleAliases);
>
>     resolveWindowFunctionsInExpression(node, rewrittenExpression);
>
>     analyzeSelectSingleColumn(
>         rewrittenExpression,
>         node,
>         scope,
>         outputExpressionBuilder,
>         selectExpressionBuilder);
>
>     singleColumnOutputExpressions.put(
>         NodeRef.of(singleColumn),
>         ImmutableList.of(rewrittenExpression));
>
>     if (singleColumn.getAlias().isPresent() &&
> !containsColumnsFunction(singleColumn)) {
>       Identifier alias = singleColumn.getAlias().get();
>       visibleAliases.add(
>           new SelectAlias(
>               alias.getCanonicalValue(),
>               outputPosition,
>               rewrittenExpression));
>     }
>
>     outputPosition++;
>   }
> }
>
> SelectAlias can be extended to keep:
>
> canonicalName
> position
> rewrittenExpression
>
> SelectAnalysis should keep semantic output expressions per SingleColumn,
> for example:
>
> NodeRef<SingleColumn> -> List<Expression>
>
> The original AST should remain unchanged because SingleColumn.expression is
> final.
> Expression rewriting
>
> LCA rewriting should replace only unqualified Identifiers.
>
> When an identifier is encountered:
>
> 1. If it resolves to a local input column, keep it unchanged.
> 2. Otherwise, look up visible previous aliases.
> 3. If exactly one alias matches, replace the identifier with a deep
> copy of that alias's rewritten expression.
> 4. If multiple aliases match, report alias ambiguity.
> 5. If no alias matches, keep the identifier unchanged and let existing
> analysis handle it.
>
> The rewriter should not traverse into SubqueryExpression.
>
> It should also handle DereferenceExpression carefully so that expressions
> such as t.x, table1.x, and x.y are not rewritten as alias references.
>
> Each alias replacement must create a deep copy of the defining expression.
> Reusing the same expression instance is unsafe because IoTDB uses NodeRef
> as identity-based keys in analysis maps.
> AllColumns and COLUMNS(...)
>
> AllColumns should not register aliases for LCA.
>
> A SingleColumn containing COLUMNS(...) should be expanded and analyzed as
> today, but its alias should not be registered as a reusable LCA alias
> because it may expand to multiple output columns.
>
> SELECT COLUMNS('s.*') AS x, x + 1 AS y
> FROM table1;
>
> Here, x should not be treated as a unique alias definition from
> COLUMNS(...)
> .
> Aggregation and window functions
>
> LCA may reference previous aggregate expressions:
>
> SELECT avg(s1) AS a, a + 1 AS b
> FROM table1;
>
> This should be rewritten to:
>
> SELECT avg(s1) AS a, avg(s1) + 1 AS b
> FROM table1;
>
> Existing aggregation validity checks should still apply after rewriting.
>
> For window functions, supporting alias references is preferred:
>
> SELECT row_number() OVER (ORDER BY s1) AS rn, rn + 1 AS rn2
> FROM table1;
>
> Since LCA deep copy creates new FunctionCall nodes, any copied window
> functions need their resolved window metadata registered again in Analysis.
> If this is not supported in the first implementation, we should reject such
> aliases explicitly with a clear error instead of failing silently.
> GROUP BY and ORDER BY compatibility
>
> This should remain compatible with Part 1 alias reuse behavior.
>
> SELECT date_bin(1h, time) AS hour_time, avg(s1)
> FROM table1
> GROUP BY hour_time
> ORDER BY hour_time;
>
> GROUP BY <alias> should continue to resolve to the corresponding select
> expression, while ORDER BY <alias> can continue to resolve to the output
> field reference.
>
> Because SELECT output expressions are already rewritten, Part 1 and Part 2
> should compose naturally:
>
> SELECT s1 AS x, x + 1 AS y, count(*)
> FROM table1
> GROUP BY y;
>
> GROUP BY y should resolve to:
>
> s1 + 1
>
> Suggested tests
>
> I plan to add or update tests in SelectAliasReuseTest for:
>
> SELECT s1 AS x, x + 1 AS y FROM table1;
> SELECT s1 AS x, x + 1 AS y, y * 2 AS z FROM table1;
> SELECT s1 AS x, x + x AS y FROM table1;
> SELECT y + 1 AS x, s1 AS y FROM table1;
> SELECT x + 1 AS x FROM table1;
> SELECT s1 AS x, x + 1 AS y FROM table_with_x;
> SELECT s1 AS x, s2 AS x, x + 1 AS y FROM table1;
> SELECT s1 AS x, s2 AS x, x + 1 AS y FROM table_with_x;
> SELECT s1 AS x, table1.x + 1 AS y FROM table1;
> SELECT s1 AS x, (SELECT x FROM table1) AS y FROM table1;
> SELECT avg(s1) AS a, a + 1 AS b FROM table1;
> SELECT s1 AS x, avg(s2) + x AS y FROM table1;
> SELECT avg(s1) AS a FROM table1 HAVING a > 1;
> SELECT s1 AS x FROM table1 WHERE x > 1;
> SELECT COLUMNS('s.*') AS x, x + 1 AS y FROM table1;
> SELECT row_number() OVER (ORDER BY s1) AS rn, rn + 1 AS rn2 FROM table1;
> SELECT s1 AS x, x FROM table1;
>
> Please let me know whether this direction looks reasonable, especially the
> resolution priority, duplicate alias handling, and the preferred behavior
> for window function aliases.
>
> *Best regards, Bryan Yang(杨易达)*
>

Reply via email to