*Hi Apache IoTDB Community,*

I would like to propose a complete fix for the following issue:

https://github.com/apache/iotdb/issues/17797

This issue aims to support reuse of SELECT output aliases in IoTDB table
model SQL.

The goal is to support the following three cases in one integrated
implementation:

   1. GROUP BY <select_alias>
   2. ORDER BY <select_alias>
   3. Later SELECT items referencing earlier SELECT aliases, also known as
   lateral column alias references

This change does not require parser changes. The current SQL grammar can
already parse these statements. The required work is in analyzer-stage name
resolution, expression rewriting, Analysis recording, and planner
consistency.
Functional Definition

The intended syntax is:

select_item :=
expression [AS] alias

group_by_item :=
expression
| ordinal_position
| select_alias

order_by_item :=
expression [ASC | DESC] [NULLS FIRST | NULLS LAST]
| ordinal_position
| select_alias

lateral_column_alias_reference :=
unqualified_identifier

Here, expression means any expression currently supported by the table
model, including column references, scalar functions, arithmetic
expressions, comparisons, aggregate functions, window functions,
date_bin(...), and date_bin_gapfill(...).

alias means an explicit SELECT item output alias, for example:

s1 + 1 AS x
AVG(s1) AS avg_s1

Only explicit SingleColumn aliases should be included in SELECT alias
metadata. SELECT *, columns(*), and implicit output names should not become
lateral column aliases.

select_alias means an output alias defined in the current SELECT list. It
is visible only in GROUP BY, ORDER BY, and later SELECT items. It is not
visible in WHERE or HAVING.

ordinal_position means the existing positional reference behavior, such as
GROUP BY 1 and ORDER BY 2. This behavior must remain unchanged.

unqualified_identifier means an identifier such as x or avg_s1. Qualified
names such as t.x, table1.x, and db.table1.x must not be treated as lateral
column alias references.
Semantic Rules

GROUP BY alias should be expanded to the corresponding SELECT expression
and then validated by the existing GROUP BY rules. The expanded expression
must be comparable and must not contain aggregate functions, window
functions, or grouping functions.

For example:

SELECT date_bin(1h, time) AS hour_time, AVG(s1) AS avg_s1
FROM table1
GROUP BY hour_time
ORDER BY hour_time;

Here, GROUP BY hour_time is equivalent to GROUP BY date_bin(1h, time).

The following query must fail:

SELECT AVG(s1) AS avg_s1
FROM table1
GROUP BY avg_s1;

It is equivalent to GROUP BY AVG(s1), which is invalid because an aggregate
expression cannot be used as a grouping key.

ORDER BY alias should resolve to the output column. The output type must be
orderable, and the existing sort direction and null ordering behavior
should remain unchanged.

For example:

SELECT s1 + 1 AS x
FROM table1
ORDER BY x DESC NULLS LAST;

Lateral column alias references should allow later SELECT items to
reference aliases defined earlier in the same SELECT list.

For example:

SELECT s1 + 1 AS x, x * 2 AS y
FROM table1;

SELECT AVG(s1) AS avg_s1, avg_s1 + 10 AS shifted_avg
FROM table1;

Chained lateral column aliases should also be supported:

SELECT s1 + 1 AS x, x * 2 AS y, y + 3 AS z
FROM table1;

This should be analyzed as if y were expanded from the already rewritten
expression for x.
Name Resolution Rules

In ORDER BY, if an unqualified name matches both a SELECT alias and an
input column, the SELECT alias takes precedence.

In GROUP BY, if an unqualified name matches both a SELECT alias and an
input column, the input column takes precedence.

In lateral column alias resolution, if an unqualified name matches both an
input column and a previous SELECT alias, the input column takes precedence.

Lateral column aliases may only reference aliases defined earlier in the
SELECT list. They cannot reference themselves or aliases defined later.

For example, the following query must fail because x is referenced before
it is defined:

SELECT x + 1 AS y, s1 AS x
FROM table1;

If multiple previous aliases have the same name, referencing that name
should produce an ambiguity error unless an input column with the same name
takes precedence.

For example:

SELECT s1 AS x, s2 AS x, x + 1
FROM table1;

If the input table does not contain a column named x, the third x should be
reported as ambiguous.

Similarly:

SELECT s1 AS x, s2 AS x
FROM table1
ORDER BY x;

ORDER BY x should be reported as ambiguous.

Subqueries must have independent SELECT alias scopes. Outer SELECT aliases
must not be visible inside subqueries, and aliases defined inside a
subquery must not affect the outer query scope.
Unsupported Cases

The following cases should remain unsupported:

SELECT s1 AS x
FROM table1
WHERE x > 1;

SELECT AVG(s1) AS avg_s1
FROM table1
HAVING avg_s1 > 1;

WHERE and HAVING should continue to use the existing name resolution rules
and should not resolve SELECT aliases.
Implementation Plan

The implementation should add SELECT alias metadata in StatementAnalyzer.
For each explicit SELECT alias, the analyzer should record:

canonical alias name
rewritten expression
SELECT item position

The canonical alias name should use Identifier.getCanonicalValue() to
preserve existing quoted and unquoted identifier behavior.

analyzeSelect should process SELECT items in order. For each SingleColumn,
it should first rewrite the expression using previously defined aliases,
then analyze the rewritten expression, and then record the rewritten
expression in both outputExpressions and Analysis.SelectExpression. After
the current SELECT item is analyzed, if it has an explicit alias, that
alias should be registered with the rewritten expression.

Lateral column alias rewriting should be implemented with the existing
ExpressionTreeRewriter. The rewriter should replace only unqualified
Identifier nodes. It should first try to resolve the identifier as an input
column in the source scope. If the input column exists, no alias
replacement should occur. If the input column does not exist, the rewriter
should look up previous SELECT aliases. A single match should be replaced
by the corresponding rewritten expression. Multiple matches should produce
an ambiguity error. No match should leave the identifier unchanged so that
the normal analyzer can report unresolved columns. The rewriter must not
traverse into SubqueryExpression.

GROUP BY analysis should be updated so that a simple Identifier first
resolves against the source scope. If it resolves as an input column, the
existing behavior is preserved. If it does not resolve as an input column,
it should be resolved against the SELECT alias metadata. A unique alias
match should be replaced with the alias’s rewritten expression before
existing GROUP BY validation.

After alias expansion, GROUP BY should continue to use the existing checks:

no aggregate functions
no window functions
no grouping functions
expression analysis
comparable type validation
gap-fill grouping key detection

GroupingSetAnalysis.originalExpressions, complexExpressions, grouping
expressions, and gap-fill grouping keys should all store the expanded
expression, not the original alias Identifier.

ORDER BY should keep output alias precedence. The current order-by scope
already exposes output fields before source fields, but the implementation
should ensure that duplicate aliases, input-column conflicts, aggregation
queries, and ordinal references are handled correctly.
analysis.setOrderByExpressions should record expressions that the planner
can translate reliably.

computeAndAssignOutputScope must use rewritten SELECT expressions rather
than the original SingleColumn.getExpression(). Output field type,
source-column analysis, and origin-column inference must be based on the
same rewritten expression stored in Analysis.SelectExpression. This avoids
inconsistencies where an expression has been analyzed after rewriting but
the output scope still refers to the original AST.

The planner should not introduce new alias semantics. It should consume the
already rewritten expressions from:

analysis.getSelectExpressions()
analysis.getOrderByExpressions()
analysis.getGroupingSets()

This keeps projection, aggregation, sorting, gap-fill, distinct, and window
validation consistent with the analyzer result.
Validation Plan

The fix should include analyzer tests and table-model integration tests for
the following cases:

GROUP BY alias
ORDER BY alias
lateral column alias with scalar expressions
lateral column alias with aggregate aliases
chained lateral column aliases
duplicate alias ambiguity
input column precedence over alias in GROUP BY
output alias precedence in ORDER BY
input column precedence over alias in LCA
qualified names not rewritten as aliases
subquery alias scope isolation
WHERE alias rejected
HAVING alias rejected
GROUP BY ordinal unchanged
ORDER BY ordinal unchanged
full expression GROUP BY unchanged
full expression ORDER BY unchanged
date_bin behavior unchanged
date_bin_gapfill behavior unchanged

The implementation should be verified with:

mvn spotless:apply -pl iotdb-core/datanode
mvn compile -pl iotdb-core/datanode
mvn test -pl iotdb-core/datanode -Dtest=<new-or-modified-test-class>
Expected Result

After this fix, IoTDB table model SQL should fully support SELECT alias
reuse in GROUP BY, ORDER BY, and later SELECT items. The implementation
should preserve existing ordinal behavior, existing expression semantics,
and existing query scope boundaries.

*Best regards,*

*Bryan Yang(杨易达)*

Reply via email to