*Hi IoTDB community,*

I would like to propose the implementation plan for Part 2 of
apache/iotdb#17797: supporting Lateral Column Alias (LCA) in the table
model SELECT list. issue: https://github.com/apache/iotdb/issues/17797

LCA allows a later SELECT item to reference an explicit alias defined by an
earlier SELECT item.

SELECT s1 AS x, x + 1 AS y
FROM table1;

This should be analyzed as:

SELECT s1 AS x, s1 + 1 AS y
FROM table1;

The implementation does not require new keywords or grammar changes. The
existing SELECT syntax can be reused, and LCA can be resolved during
analysis by rewriting expressions before type analysis, aggregation
analysis, output scope computation, and planning.
Proposed semantics

LCA is resolved from left to right inside the SELECT list.

SELECT s1 AS x, x + 1 AS y, y * 2 AS z
FROM table1;

is equivalent to:

SELECT s1 AS x, s1 + 1 AS y, (s1 + 1) * 2 AS z
FROM table1;

Forward references are not supported:

SELECT y + 1 AS x, s1 AS y
FROM table1;

The alias of the current SELECT item is not visible to its own expression:

SELECT x + 1 AS x
FROM table1;

Only unqualified identifiers are considered alias references. Qualified
expressions such as t.x, table1.x, or x.y should continue to use the
existing qualified column resolution rules.

The recommended resolution priority is:

local source column > visible previous aliases > existing analyzer resolution

This means local input columns should take precedence over previous
aliases, while outer query columns should not block visible aliases from
the current SELECT list.

Duplicate aliases should not overwrite each other. If multiple previous
aliases have the same canonical name and there is no same-name local source
column, the analyzer should report:

Column alias 'x' is ambiguous

Non-goals

This change should not alter WHERE or HAVING semantics.

SELECT s1 AS x FROM table1 WHERE x > 1;
SELECT avg(s1) AS a FROM table1 HAVING a > 1;

These should still resolve x or a only through the input scope, not through
SELECT aliases.

LCA should also not enter subquery scopes:

SELECT s1 AS x, (SELECT x FROM table1) AS y
FROM table1;

The x inside the subquery should not be rewritten using the outer SELECT
alias.
Analyzer changes

The entry point should be StatementAnalyzer.analyzeSelect.

The SELECT list can be processed left to right. For each normal SingleColumn
:

1. Rewrite the original expression using visible previous aliases.
2. Register window metadata for any newly copied window functions.
3. Analyze the rewritten expression.
4. Record the rewritten expression in SelectAnalysis output expressions.
5. Record SingleColumn -> rewritten expression mapping.
6. Add the current explicit alias to visible aliases only after its
expression is rewritten and analyzed.

Pseudo-code:

List<SelectAlias> visibleAliases = new ArrayList<>();

for (SelectItem item : node.getSelect().getSelectItems()) {
  if (item instanceof SingleColumn) {
    SingleColumn singleColumn = (SingleColumn) item;

    Expression originalExpression = singleColumn.getExpression();
    Expression rewrittenExpression =
        rewriteLateralColumnAlias(originalExpression, scope, visibleAliases);

    resolveWindowFunctionsInExpression(node, rewrittenExpression);

    analyzeSelectSingleColumn(
        rewrittenExpression,
        node,
        scope,
        outputExpressionBuilder,
        selectExpressionBuilder);

    singleColumnOutputExpressions.put(
        NodeRef.of(singleColumn),
        ImmutableList.of(rewrittenExpression));

    if (singleColumn.getAlias().isPresent() &&
!containsColumnsFunction(singleColumn)) {
      Identifier alias = singleColumn.getAlias().get();
      visibleAliases.add(
          new SelectAlias(
              alias.getCanonicalValue(),
              outputPosition,
              rewrittenExpression));
    }

    outputPosition++;
  }
}

SelectAlias can be extended to keep:

canonicalName
position
rewrittenExpression

SelectAnalysis should keep semantic output expressions per SingleColumn,
for example:

NodeRef<SingleColumn> -> List<Expression>

The original AST should remain unchanged because SingleColumn.expression is
final.
Expression rewriting

LCA rewriting should replace only unqualified Identifiers.

When an identifier is encountered:

1. If it resolves to a local input column, keep it unchanged.
2. Otherwise, look up visible previous aliases.
3. If exactly one alias matches, replace the identifier with a deep
copy of that alias's rewritten expression.
4. If multiple aliases match, report alias ambiguity.
5. If no alias matches, keep the identifier unchanged and let existing
analysis handle it.

The rewriter should not traverse into SubqueryExpression.

It should also handle DereferenceExpression carefully so that expressions
such as t.x, table1.x, and x.y are not rewritten as alias references.

Each alias replacement must create a deep copy of the defining expression.
Reusing the same expression instance is unsafe because IoTDB uses NodeRef
as identity-based keys in analysis maps.
AllColumns and COLUMNS(...)

AllColumns should not register aliases for LCA.

A SingleColumn containing COLUMNS(...) should be expanded and analyzed as
today, but its alias should not be registered as a reusable LCA alias
because it may expand to multiple output columns.

SELECT COLUMNS('s.*') AS x, x + 1 AS y
FROM table1;

Here, x should not be treated as a unique alias definition from COLUMNS(...)
.
Aggregation and window functions

LCA may reference previous aggregate expressions:

SELECT avg(s1) AS a, a + 1 AS b
FROM table1;

This should be rewritten to:

SELECT avg(s1) AS a, avg(s1) + 1 AS b
FROM table1;

Existing aggregation validity checks should still apply after rewriting.

For window functions, supporting alias references is preferred:

SELECT row_number() OVER (ORDER BY s1) AS rn, rn + 1 AS rn2
FROM table1;

Since LCA deep copy creates new FunctionCall nodes, any copied window
functions need their resolved window metadata registered again in Analysis.
If this is not supported in the first implementation, we should reject such
aliases explicitly with a clear error instead of failing silently.
GROUP BY and ORDER BY compatibility

This should remain compatible with Part 1 alias reuse behavior.

SELECT date_bin(1h, time) AS hour_time, avg(s1)
FROM table1
GROUP BY hour_time
ORDER BY hour_time;

GROUP BY <alias> should continue to resolve to the corresponding select
expression, while ORDER BY <alias> can continue to resolve to the output
field reference.

Because SELECT output expressions are already rewritten, Part 1 and Part 2
should compose naturally:

SELECT s1 AS x, x + 1 AS y, count(*)
FROM table1
GROUP BY y;

GROUP BY y should resolve to:

s1 + 1

Suggested tests

I plan to add or update tests in SelectAliasReuseTest for:

SELECT s1 AS x, x + 1 AS y FROM table1;
SELECT s1 AS x, x + 1 AS y, y * 2 AS z FROM table1;
SELECT s1 AS x, x + x AS y FROM table1;
SELECT y + 1 AS x, s1 AS y FROM table1;
SELECT x + 1 AS x FROM table1;
SELECT s1 AS x, x + 1 AS y FROM table_with_x;
SELECT s1 AS x, s2 AS x, x + 1 AS y FROM table1;
SELECT s1 AS x, s2 AS x, x + 1 AS y FROM table_with_x;
SELECT s1 AS x, table1.x + 1 AS y FROM table1;
SELECT s1 AS x, (SELECT x FROM table1) AS y FROM table1;
SELECT avg(s1) AS a, a + 1 AS b FROM table1;
SELECT s1 AS x, avg(s2) + x AS y FROM table1;
SELECT avg(s1) AS a FROM table1 HAVING a > 1;
SELECT s1 AS x FROM table1 WHERE x > 1;
SELECT COLUMNS('s.*') AS x, x + 1 AS y FROM table1;
SELECT row_number() OVER (ORDER BY s1) AS rn, rn + 1 AS rn2 FROM table1;
SELECT s1 AS x, x FROM table1;

Please let me know whether this direction looks reasonable, especially the
resolution priority, duplicate alias handling, and the preferred behavior
for window function aliases.

*Best regards, Bryan Yang(杨易达)*

Reply via email to