Hi Apache IoTDB Community, I would like to work on Part 1 of issue #17797: https://github.com/apache/iotdb/issues/17797
This mail is to clarify the feature definition before implementation. Scope This PR will focus only on allowing SELECT column aliases to be referenced in GROUP BY and ORDER BY in the table model SQL analyzer. Lateral Column Alias references inside the SELECT list are out of scope for this PR and will be handled in a separate follow-up PR. Syntax The supported syntax is: SELECT <expression> [AS] <alias>, ... FROM <table> GROUP BY <alias> ORDER BY <alias> [ASC | DESC] [NULLS FIRST | NULLS LAST]; Components - <expression>: any valid SELECT expression already supported by the table model analyzer. - <alias>: an explicit alias defined by a SELECT item, for example: date_bin(1h, time) AS hour_time - GROUP BY <alias>: uses the corresponding SELECT expression as the grouping key. - ORDER BY <alias>: uses the corresponding SELECT expression as the ordering key. - ASC / DESC / NULLS FIRST / NULLS LAST keep the existing ORDER BY semantics. Type Rules The alias itself does not introduce a new data type. Its type is the type of the SELECT expression it refers to. For GROUP BY alias: - The resolved expression must be comparable. - The resolved expression must not contain aggregate, window, or grouping functions. - Existing GROUP BY semantic checks should run after alias resolution. For ORDER BY alias: - The resolved expression must be orderable. - Existing ORDER BY semantic checks should run after alias resolution. Name Resolution Rules For a single unqualified identifier in GROUP BY: 1. Try to resolve it as an input column first. 2. If it does not resolve to an input column, try to resolve it as a SELECT alias. 3. If multiple SELECT aliases match, report an ambiguity error. For a single unqualified identifier in ORDER BY: 1. Try to resolve it as a SELECT output alias first. 2. If no SELECT alias matches, fall back to the existing ORDER BY name resolution behavior. 3. If multiple SELECT aliases match, report an ambiguity error. This follows the convention mentioned in the issue: - GROUP BY prefers the input column. - ORDER BY prefers the SELECT output alias. Existing Behavior To Preserve The following existing forms should keep working unchanged: GROUP BY 1 ORDER BY 1 GROUP BY date_bin(1h, time) ORDER BY AVG(temperature) Aliases will not be introduced into WHERE or HAVING in this PR. Examples Example 1: GROUP BY and ORDER BY alias SELECT date_bin(1h, time) AS hour_time, AVG(temperature) AS avg_temp FROM table1 GROUP BY hour_time ORDER BY hour_time; This should be analyzed as if the user wrote: SELECT date_bin(1h, time) AS hour_time, AVG(temperature) AS avg_temp FROM table1 GROUP BY date_bin(1h, time) ORDER BY date_bin(1h, time); Example 2: GROUP BY collision prefers input column If table1 has an input column named hour_time: SELECT date_bin(1h, time) AS hour_time, AVG(temperature) FROM table1 GROUP BY hour_time; GROUP BY hour_time should resolve to the input column hour_time, not the SELECT alias. Example 3: ORDER BY collision prefers SELECT alias If table1 has an input column named hour_time: SELECT date_bin(1h, time) AS hour_time FROM table1 ORDER BY hour_time; ORDER BY hour_time should resolve to the SELECT alias. Example 4: Duplicate alias ambiguity SELECT s1 AS x, s2 AS x FROM table1 ORDER BY x; This should fail with a clear ambiguity error. Example 5: Aggregate alias in GROUP BY should fail SELECT AVG(temperature) AS avg_temp FROM table1 GROUP BY avg_temp; This should fail because GROUP BY cannot use an aggregate expression as a grouping key. Implementation Plan - Collect explicit SELECT aliases during SELECT analysis, keeping the alias name, original expression, and position. - During GROUP BY analysis, resolve a single identifier against SELECT aliases only when it does not resolve to an input column first. - During ORDER BY analysis, resolve a single identifier against SELECT aliases before falling back to the existing ORDER BY scope behavior. - Run the existing type, aggregation, and expression validation after alias resolution. - Keep Lateral Column Alias support out of this PR. Test Plan I plan to add tests for: - GROUP BY alias - ORDER BY alias - GROUP BY alias with date_bin/date_bin_gapfill-style expressions - alias/input-column collision behavior - duplicate alias ambiguity - aggregate alias used in GROUP BY should fail - existing ordinal references should keep working - existing full-expression references should keep working - SELECT DISTINCT with ORDER BY alias, if applicable Could you please confirm whether this feature definition looks reasonable? If there are no objections, I would like to work on Part 1 of this issue. Thank Bryan Yang(杨易达)
