Hi Bryan, I've just reviewed your pr, and left some comments[1], there are some places that you need to fix.
[1] https://github.com/apache/iotdb/pull/17960#discussion_r3432746539 Best regards, ---------------------- Yuan Tian On Wed, Jun 17, 2026 at 2:53 PM Bryan Yang <[email protected]> wrote: > Hi Yuan Tian, > > I have completed the implementation for Issue #17797 Part 2: supporting > lateral column aliases in the table-model SELECT list. > > The PR is available here: > > https://github.com/apache/iotdb/pull/17960 > > This PR adds left-to-right SELECT-list alias resolution, keeps local input > columns at higher priority than SELECT aliases, avoids rewriting qualified > expressions and subqueries, preserves WHERE/HAVING semantics, and keeps the > existing ORDER BY alias behavior. It also includes analyzer tests and a > plan-level test to ensure reused aliases do not cause duplicated projection > computation. > > I have verified the changes with: > ./mvnw spotless:apply -pl iotdb-core/datanode ./mvnw -nsu test -pl > iotdb-core/datanode -Dtest=SelectAliasReuseTest > > The PR is ready for review. It will close #17797 after being merged. > > Best regards, > Bryan Yang (杨易达) > > Yuan Tian <[email protected]> 于2026年6月16日周二 14:22写道: > > > Hi Bryan, > > > > > > Thanks for the detailed proposal. Overall, the direction looks > > reasonable to me. Reusing the existing SELECT syntax and resolving LCA > > during analysis also seems consistent with the current alias-reuse > > work for GROUP BY / ORDER BY. > > > > A few comments after looking at the current analyzer and planner code: > > > > 1. The proposed resolution priority looks good to me: > > > > local source column > visible previous aliases > existing analyzer > > resolution > > > > This matches the current GROUP BY alias behavior, where local input > > columns take precedence over SELECT aliases, while outer-scope columns > > should not block aliases defined in the current SELECT list. > > > > 2. I agree that duplicate aliases should be treated as ambiguous > > instead of overwriting each other. This is also consistent with the > > current SELECT alias reuse behavior. > > > > 3. Please be careful with performance. For queries that do not use > > LCA, we should avoid adding noticeable overhead. For example, if there > > are no visible previous aliases, the rewrite step should be skipped > > directly. Also, alias lookup should probably use a canonical-name > > map/multimap instead of scanning a list for every Identifier. > > > > 4. For queries that do use LCA, deep-copying the defining expression > > is necessary because Analysis uses NodeRef identity-based maps. > > However, pure expression inlining may duplicate scalar computations in > > the generated plan. For example: > > > > SELECT expensive(s1) AS x, x + 1 AS y > > FROM table1; > > > > If this is rewritten as: > > > > SELECT expensive(s1) AS x, expensive(s1) + 1 AS y > > FROM table1; > > > > then the scalar expression may be evaluated twice unless the planner > > later deduplicates it. This is especially worth checking for chained > > aliases, where repeated deep copies could cause the expression tree to > > grow quickly. I think we should add plan-level tests for this, not > > only analyzer tests. > > > > 5. I agree that the rewriter must not enter SubqueryExpression, and > > DereferenceExpression needs special handling. In particular, > > expressions like t.x, table1.x, and x.y should not rewrite the base > > identifier as an alias reference. > > > > 6. For window function aliases, I agree with your note: either copied > > window functions must have their resolved window metadata registered > > again, or the first implementation should reject this case explicitly > > with a clear error message. > > > > One small wording point: I think “Forward references are not > > supported” is the correct English term here, because the expression is > > referring to an alias defined later in the SELECT list. To avoid > > ambiguity, maybe we can phrase it as: > > > > “Forward references, i.e., references to aliases defined later in the > > SELECT list, are not supported.” > > > > Best regards, > > --------------- > > Yuan Tian > > > > > > On Wed, Jun 10, 2026 at 4:50 PM Bryan Yang <[email protected]> wrote: > > > > > *Hi IoTDB community,* > > > > > > I would like to propose the implementation plan for Part 2 of > > > apache/iotdb#17797: supporting Lateral Column Alias (LCA) in the table > > > model SELECT list. issue: https://github.com/apache/iotdb/issues/17797 > > > > > > LCA allows a later SELECT item to reference an explicit alias defined > by > > an > > > earlier SELECT item. > > > > > > SELECT s1 AS x, x + 1 AS y > > > FROM table1; > > > > > > This should be analyzed as: > > > > > > SELECT s1 AS x, s1 + 1 AS y > > > FROM table1; > > > > > > The implementation does not require new keywords or grammar changes. > The > > > existing SELECT syntax can be reused, and LCA can be resolved during > > > analysis by rewriting expressions before type analysis, aggregation > > > analysis, output scope computation, and planning. > > > Proposed semantics > > > > > > LCA is resolved from left to right inside the SELECT list. > > > > > > SELECT s1 AS x, x + 1 AS y, y * 2 AS z > > > FROM table1; > > > > > > is equivalent to: > > > > > > SELECT s1 AS x, s1 + 1 AS y, (s1 + 1) * 2 AS z > > > FROM table1; > > > > > > Forward references are not supported: > > > > > > SELECT y + 1 AS x, s1 AS y > > > FROM table1; > > > > > > The alias of the current SELECT item is not visible to its own > > expression: > > > > > > SELECT x + 1 AS x > > > FROM table1; > > > > > > Only unqualified identifiers are considered alias references. Qualified > > > expressions such as t.x, table1.x, or x.y should continue to use the > > > existing qualified column resolution rules. > > > > > > The recommended resolution priority is: > > > > > > local source column > visible previous aliases > existing analyzer > > > resolution > > > > > > This means local input columns should take precedence over previous > > > aliases, while outer query columns should not block visible aliases > from > > > the current SELECT list. > > > > > > Duplicate aliases should not overwrite each other. If multiple previous > > > aliases have the same canonical name and there is no same-name local > > source > > > column, the analyzer should report: > > > > > > Column alias 'x' is ambiguous > > > > > > Non-goals > > > > > > This change should not alter WHERE or HAVING semantics. > > > > > > SELECT s1 AS x FROM table1 WHERE x > 1; > > > SELECT avg(s1) AS a FROM table1 HAVING a > 1; > > > > > > These should still resolve x or a only through the input scope, not > > through > > > SELECT aliases. > > > > > > LCA should also not enter subquery scopes: > > > > > > SELECT s1 AS x, (SELECT x FROM table1) AS y > > > FROM table1; > > > > > > The x inside the subquery should not be rewritten using the outer > SELECT > > > alias. > > > Analyzer changes > > > > > > The entry point should be StatementAnalyzer.analyzeSelect. > > > > > > The SELECT list can be processed left to right. For each normal > > > SingleColumn > > > : > > > > > > 1. Rewrite the original expression using visible previous aliases. > > > 2. Register window metadata for any newly copied window functions. > > > 3. Analyze the rewritten expression. > > > 4. Record the rewritten expression in SelectAnalysis output > expressions. > > > 5. Record SingleColumn -> rewritten expression mapping. > > > 6. Add the current explicit alias to visible aliases only after its > > > expression is rewritten and analyzed. > > > > > > Pseudo-code: > > > > > > List<SelectAlias> visibleAliases = new ArrayList<>(); > > > > > > for (SelectItem item : node.getSelect().getSelectItems()) { > > > if (item instanceof SingleColumn) { > > > SingleColumn singleColumn = (SingleColumn) item; > > > > > > Expression originalExpression = singleColumn.getExpression(); > > > Expression rewrittenExpression = > > > rewriteLateralColumnAlias(originalExpression, scope, > > > visibleAliases); > > > > > > resolveWindowFunctionsInExpression(node, rewrittenExpression); > > > > > > analyzeSelectSingleColumn( > > > rewrittenExpression, > > > node, > > > scope, > > > outputExpressionBuilder, > > > selectExpressionBuilder); > > > > > > singleColumnOutputExpressions.put( > > > NodeRef.of(singleColumn), > > > ImmutableList.of(rewrittenExpression)); > > > > > > if (singleColumn.getAlias().isPresent() && > > > !containsColumnsFunction(singleColumn)) { > > > Identifier alias = singleColumn.getAlias().get(); > > > visibleAliases.add( > > > new SelectAlias( > > > alias.getCanonicalValue(), > > > outputPosition, > > > rewrittenExpression)); > > > } > > > > > > outputPosition++; > > > } > > > } > > > > > > SelectAlias can be extended to keep: > > > > > > canonicalName > > > position > > > rewrittenExpression > > > > > > SelectAnalysis should keep semantic output expressions per > SingleColumn, > > > for example: > > > > > > NodeRef<SingleColumn> -> List<Expression> > > > > > > The original AST should remain unchanged because > SingleColumn.expression > > is > > > final. > > > Expression rewriting > > > > > > LCA rewriting should replace only unqualified Identifiers. > > > > > > When an identifier is encountered: > > > > > > 1. If it resolves to a local input column, keep it unchanged. > > > 2. Otherwise, look up visible previous aliases. > > > 3. If exactly one alias matches, replace the identifier with a deep > > > copy of that alias's rewritten expression. > > > 4. If multiple aliases match, report alias ambiguity. > > > 5. If no alias matches, keep the identifier unchanged and let existing > > > analysis handle it. > > > > > > The rewriter should not traverse into SubqueryExpression. > > > > > > It should also handle DereferenceExpression carefully so that > expressions > > > such as t.x, table1.x, and x.y are not rewritten as alias references. > > > > > > Each alias replacement must create a deep copy of the defining > > expression. > > > Reusing the same expression instance is unsafe because IoTDB uses > NodeRef > > > as identity-based keys in analysis maps. > > > AllColumns and COLUMNS(...) > > > > > > AllColumns should not register aliases for LCA. > > > > > > A SingleColumn containing COLUMNS(...) should be expanded and analyzed > as > > > today, but its alias should not be registered as a reusable LCA alias > > > because it may expand to multiple output columns. > > > > > > SELECT COLUMNS('s.*') AS x, x + 1 AS y > > > FROM table1; > > > > > > Here, x should not be treated as a unique alias definition from > > > COLUMNS(...) > > > . > > > Aggregation and window functions > > > > > > LCA may reference previous aggregate expressions: > > > > > > SELECT avg(s1) AS a, a + 1 AS b > > > FROM table1; > > > > > > This should be rewritten to: > > > > > > SELECT avg(s1) AS a, avg(s1) + 1 AS b > > > FROM table1; > > > > > > Existing aggregation validity checks should still apply after > rewriting. > > > > > > For window functions, supporting alias references is preferred: > > > > > > SELECT row_number() OVER (ORDER BY s1) AS rn, rn + 1 AS rn2 > > > FROM table1; > > > > > > Since LCA deep copy creates new FunctionCall nodes, any copied window > > > functions need their resolved window metadata registered again in > > Analysis. > > > If this is not supported in the first implementation, we should reject > > such > > > aliases explicitly with a clear error instead of failing silently. > > > GROUP BY and ORDER BY compatibility > > > > > > This should remain compatible with Part 1 alias reuse behavior. > > > > > > SELECT date_bin(1h, time) AS hour_time, avg(s1) > > > FROM table1 > > > GROUP BY hour_time > > > ORDER BY hour_time; > > > > > > GROUP BY <alias> should continue to resolve to the corresponding select > > > expression, while ORDER BY <alias> can continue to resolve to the > output > > > field reference. > > > > > > Because SELECT output expressions are already rewritten, Part 1 and > Part > > 2 > > > should compose naturally: > > > > > > SELECT s1 AS x, x + 1 AS y, count(*) > > > FROM table1 > > > GROUP BY y; > > > > > > GROUP BY y should resolve to: > > > > > > s1 + 1 > > > > > > Suggested tests > > > > > > I plan to add or update tests in SelectAliasReuseTest for: > > > > > > SELECT s1 AS x, x + 1 AS y FROM table1; > > > SELECT s1 AS x, x + 1 AS y, y * 2 AS z FROM table1; > > > SELECT s1 AS x, x + x AS y FROM table1; > > > SELECT y + 1 AS x, s1 AS y FROM table1; > > > SELECT x + 1 AS x FROM table1; > > > SELECT s1 AS x, x + 1 AS y FROM table_with_x; > > > SELECT s1 AS x, s2 AS x, x + 1 AS y FROM table1; > > > SELECT s1 AS x, s2 AS x, x + 1 AS y FROM table_with_x; > > > SELECT s1 AS x, table1.x + 1 AS y FROM table1; > > > SELECT s1 AS x, (SELECT x FROM table1) AS y FROM table1; > > > SELECT avg(s1) AS a, a + 1 AS b FROM table1; > > > SELECT s1 AS x, avg(s2) + x AS y FROM table1; > > > SELECT avg(s1) AS a FROM table1 HAVING a > 1; > > > SELECT s1 AS x FROM table1 WHERE x > 1; > > > SELECT COLUMNS('s.*') AS x, x + 1 AS y FROM table1; > > > SELECT row_number() OVER (ORDER BY s1) AS rn, rn + 1 AS rn2 FROM > table1; > > > SELECT s1 AS x, x FROM table1; > > > > > > Please let me know whether this direction looks reasonable, especially > > the > > > resolution priority, duplicate alias handling, and the preferred > behavior > > > for window function aliases. > > > > > > *Best regards, Bryan Yang(杨易达)* > > > > > >
