Hi Yuan Tian, I have completed the implementation for Issue #17797 Part 2: supporting lateral column aliases in the table-model SELECT list.
The PR is available here: https://github.com/apache/iotdb/pull/17960 This PR adds left-to-right SELECT-list alias resolution, keeps local input columns at higher priority than SELECT aliases, avoids rewriting qualified expressions and subqueries, preserves WHERE/HAVING semantics, and keeps the existing ORDER BY alias behavior. It also includes analyzer tests and a plan-level test to ensure reused aliases do not cause duplicated projection computation. I have verified the changes with: ./mvnw spotless:apply -pl iotdb-core/datanode ./mvnw -nsu test -pl iotdb-core/datanode -Dtest=SelectAliasReuseTest The PR is ready for review. It will close #17797 after being merged. Best regards, Bryan Yang (杨易达) Yuan Tian <[email protected]> 于2026年6月16日周二 14:22写道: > Hi Bryan, > > > Thanks for the detailed proposal. Overall, the direction looks > reasonable to me. Reusing the existing SELECT syntax and resolving LCA > during analysis also seems consistent with the current alias-reuse > work for GROUP BY / ORDER BY. > > A few comments after looking at the current analyzer and planner code: > > 1. The proposed resolution priority looks good to me: > > local source column > visible previous aliases > existing analyzer > resolution > > This matches the current GROUP BY alias behavior, where local input > columns take precedence over SELECT aliases, while outer-scope columns > should not block aliases defined in the current SELECT list. > > 2. I agree that duplicate aliases should be treated as ambiguous > instead of overwriting each other. This is also consistent with the > current SELECT alias reuse behavior. > > 3. Please be careful with performance. For queries that do not use > LCA, we should avoid adding noticeable overhead. For example, if there > are no visible previous aliases, the rewrite step should be skipped > directly. Also, alias lookup should probably use a canonical-name > map/multimap instead of scanning a list for every Identifier. > > 4. For queries that do use LCA, deep-copying the defining expression > is necessary because Analysis uses NodeRef identity-based maps. > However, pure expression inlining may duplicate scalar computations in > the generated plan. For example: > > SELECT expensive(s1) AS x, x + 1 AS y > FROM table1; > > If this is rewritten as: > > SELECT expensive(s1) AS x, expensive(s1) + 1 AS y > FROM table1; > > then the scalar expression may be evaluated twice unless the planner > later deduplicates it. This is especially worth checking for chained > aliases, where repeated deep copies could cause the expression tree to > grow quickly. I think we should add plan-level tests for this, not > only analyzer tests. > > 5. I agree that the rewriter must not enter SubqueryExpression, and > DereferenceExpression needs special handling. In particular, > expressions like t.x, table1.x, and x.y should not rewrite the base > identifier as an alias reference. > > 6. For window function aliases, I agree with your note: either copied > window functions must have their resolved window metadata registered > again, or the first implementation should reject this case explicitly > with a clear error message. > > One small wording point: I think “Forward references are not > supported” is the correct English term here, because the expression is > referring to an alias defined later in the SELECT list. To avoid > ambiguity, maybe we can phrase it as: > > “Forward references, i.e., references to aliases defined later in the > SELECT list, are not supported.” > > Best regards, > --------------- > Yuan Tian > > > On Wed, Jun 10, 2026 at 4:50 PM Bryan Yang <[email protected]> wrote: > > > *Hi IoTDB community,* > > > > I would like to propose the implementation plan for Part 2 of > > apache/iotdb#17797: supporting Lateral Column Alias (LCA) in the table > > model SELECT list. issue: https://github.com/apache/iotdb/issues/17797 > > > > LCA allows a later SELECT item to reference an explicit alias defined by > an > > earlier SELECT item. > > > > SELECT s1 AS x, x + 1 AS y > > FROM table1; > > > > This should be analyzed as: > > > > SELECT s1 AS x, s1 + 1 AS y > > FROM table1; > > > > The implementation does not require new keywords or grammar changes. The > > existing SELECT syntax can be reused, and LCA can be resolved during > > analysis by rewriting expressions before type analysis, aggregation > > analysis, output scope computation, and planning. > > Proposed semantics > > > > LCA is resolved from left to right inside the SELECT list. > > > > SELECT s1 AS x, x + 1 AS y, y * 2 AS z > > FROM table1; > > > > is equivalent to: > > > > SELECT s1 AS x, s1 + 1 AS y, (s1 + 1) * 2 AS z > > FROM table1; > > > > Forward references are not supported: > > > > SELECT y + 1 AS x, s1 AS y > > FROM table1; > > > > The alias of the current SELECT item is not visible to its own > expression: > > > > SELECT x + 1 AS x > > FROM table1; > > > > Only unqualified identifiers are considered alias references. Qualified > > expressions such as t.x, table1.x, or x.y should continue to use the > > existing qualified column resolution rules. > > > > The recommended resolution priority is: > > > > local source column > visible previous aliases > existing analyzer > > resolution > > > > This means local input columns should take precedence over previous > > aliases, while outer query columns should not block visible aliases from > > the current SELECT list. > > > > Duplicate aliases should not overwrite each other. If multiple previous > > aliases have the same canonical name and there is no same-name local > source > > column, the analyzer should report: > > > > Column alias 'x' is ambiguous > > > > Non-goals > > > > This change should not alter WHERE or HAVING semantics. > > > > SELECT s1 AS x FROM table1 WHERE x > 1; > > SELECT avg(s1) AS a FROM table1 HAVING a > 1; > > > > These should still resolve x or a only through the input scope, not > through > > SELECT aliases. > > > > LCA should also not enter subquery scopes: > > > > SELECT s1 AS x, (SELECT x FROM table1) AS y > > FROM table1; > > > > The x inside the subquery should not be rewritten using the outer SELECT > > alias. > > Analyzer changes > > > > The entry point should be StatementAnalyzer.analyzeSelect. > > > > The SELECT list can be processed left to right. For each normal > > SingleColumn > > : > > > > 1. Rewrite the original expression using visible previous aliases. > > 2. Register window metadata for any newly copied window functions. > > 3. Analyze the rewritten expression. > > 4. Record the rewritten expression in SelectAnalysis output expressions. > > 5. Record SingleColumn -> rewritten expression mapping. > > 6. Add the current explicit alias to visible aliases only after its > > expression is rewritten and analyzed. > > > > Pseudo-code: > > > > List<SelectAlias> visibleAliases = new ArrayList<>(); > > > > for (SelectItem item : node.getSelect().getSelectItems()) { > > if (item instanceof SingleColumn) { > > SingleColumn singleColumn = (SingleColumn) item; > > > > Expression originalExpression = singleColumn.getExpression(); > > Expression rewrittenExpression = > > rewriteLateralColumnAlias(originalExpression, scope, > > visibleAliases); > > > > resolveWindowFunctionsInExpression(node, rewrittenExpression); > > > > analyzeSelectSingleColumn( > > rewrittenExpression, > > node, > > scope, > > outputExpressionBuilder, > > selectExpressionBuilder); > > > > singleColumnOutputExpressions.put( > > NodeRef.of(singleColumn), > > ImmutableList.of(rewrittenExpression)); > > > > if (singleColumn.getAlias().isPresent() && > > !containsColumnsFunction(singleColumn)) { > > Identifier alias = singleColumn.getAlias().get(); > > visibleAliases.add( > > new SelectAlias( > > alias.getCanonicalValue(), > > outputPosition, > > rewrittenExpression)); > > } > > > > outputPosition++; > > } > > } > > > > SelectAlias can be extended to keep: > > > > canonicalName > > position > > rewrittenExpression > > > > SelectAnalysis should keep semantic output expressions per SingleColumn, > > for example: > > > > NodeRef<SingleColumn> -> List<Expression> > > > > The original AST should remain unchanged because SingleColumn.expression > is > > final. > > Expression rewriting > > > > LCA rewriting should replace only unqualified Identifiers. > > > > When an identifier is encountered: > > > > 1. If it resolves to a local input column, keep it unchanged. > > 2. Otherwise, look up visible previous aliases. > > 3. If exactly one alias matches, replace the identifier with a deep > > copy of that alias's rewritten expression. > > 4. If multiple aliases match, report alias ambiguity. > > 5. If no alias matches, keep the identifier unchanged and let existing > > analysis handle it. > > > > The rewriter should not traverse into SubqueryExpression. > > > > It should also handle DereferenceExpression carefully so that expressions > > such as t.x, table1.x, and x.y are not rewritten as alias references. > > > > Each alias replacement must create a deep copy of the defining > expression. > > Reusing the same expression instance is unsafe because IoTDB uses NodeRef > > as identity-based keys in analysis maps. > > AllColumns and COLUMNS(...) > > > > AllColumns should not register aliases for LCA. > > > > A SingleColumn containing COLUMNS(...) should be expanded and analyzed as > > today, but its alias should not be registered as a reusable LCA alias > > because it may expand to multiple output columns. > > > > SELECT COLUMNS('s.*') AS x, x + 1 AS y > > FROM table1; > > > > Here, x should not be treated as a unique alias definition from > > COLUMNS(...) > > . > > Aggregation and window functions > > > > LCA may reference previous aggregate expressions: > > > > SELECT avg(s1) AS a, a + 1 AS b > > FROM table1; > > > > This should be rewritten to: > > > > SELECT avg(s1) AS a, avg(s1) + 1 AS b > > FROM table1; > > > > Existing aggregation validity checks should still apply after rewriting. > > > > For window functions, supporting alias references is preferred: > > > > SELECT row_number() OVER (ORDER BY s1) AS rn, rn + 1 AS rn2 > > FROM table1; > > > > Since LCA deep copy creates new FunctionCall nodes, any copied window > > functions need their resolved window metadata registered again in > Analysis. > > If this is not supported in the first implementation, we should reject > such > > aliases explicitly with a clear error instead of failing silently. > > GROUP BY and ORDER BY compatibility > > > > This should remain compatible with Part 1 alias reuse behavior. > > > > SELECT date_bin(1h, time) AS hour_time, avg(s1) > > FROM table1 > > GROUP BY hour_time > > ORDER BY hour_time; > > > > GROUP BY <alias> should continue to resolve to the corresponding select > > expression, while ORDER BY <alias> can continue to resolve to the output > > field reference. > > > > Because SELECT output expressions are already rewritten, Part 1 and Part > 2 > > should compose naturally: > > > > SELECT s1 AS x, x + 1 AS y, count(*) > > FROM table1 > > GROUP BY y; > > > > GROUP BY y should resolve to: > > > > s1 + 1 > > > > Suggested tests > > > > I plan to add or update tests in SelectAliasReuseTest for: > > > > SELECT s1 AS x, x + 1 AS y FROM table1; > > SELECT s1 AS x, x + 1 AS y, y * 2 AS z FROM table1; > > SELECT s1 AS x, x + x AS y FROM table1; > > SELECT y + 1 AS x, s1 AS y FROM table1; > > SELECT x + 1 AS x FROM table1; > > SELECT s1 AS x, x + 1 AS y FROM table_with_x; > > SELECT s1 AS x, s2 AS x, x + 1 AS y FROM table1; > > SELECT s1 AS x, s2 AS x, x + 1 AS y FROM table_with_x; > > SELECT s1 AS x, table1.x + 1 AS y FROM table1; > > SELECT s1 AS x, (SELECT x FROM table1) AS y FROM table1; > > SELECT avg(s1) AS a, a + 1 AS b FROM table1; > > SELECT s1 AS x, avg(s2) + x AS y FROM table1; > > SELECT avg(s1) AS a FROM table1 HAVING a > 1; > > SELECT s1 AS x FROM table1 WHERE x > 1; > > SELECT COLUMNS('s.*') AS x, x + 1 AS y FROM table1; > > SELECT row_number() OVER (ORDER BY s1) AS rn, rn + 1 AS rn2 FROM table1; > > SELECT s1 AS x, x FROM table1; > > > > Please let me know whether this direction looks reasonable, especially > the > > resolution priority, duplicate alias handling, and the preferred behavior > > for window function aliases. > > > > *Best regards, Bryan Yang(杨易达)* > > >
