Hi Bryan,

The definition looks good to me, I'll assign this issue to you.


Best regards,
-------------------------
Yuan Tian


On Wed, Jun 3, 2026 at 11:43 AM Bryan Yang <[email protected]> wrote:

> Hi Apache IoTDB Community,
>
> I would like to work on Part 1 of issue #17797:
> https://github.com/apache/iotdb/issues/17797
>
> This mail is to clarify the feature definition before implementation.
>
> Scope
>
> This PR will focus only on allowing SELECT column aliases to be referenced
> in GROUP BY and ORDER BY in the table model SQL analyzer.
>
> Lateral Column Alias references inside the SELECT list are out of scope for
> this PR and will be handled in a separate follow-up PR.
>
> Syntax
>
> The supported syntax is:
>
> SELECT <expression> [AS] <alias>, ...
> FROM <table>
> GROUP BY <alias>
> ORDER BY <alias> [ASC | DESC] [NULLS FIRST | NULLS LAST];
>
> Components
>
> - <expression>: any valid SELECT expression already supported by the table
> model analyzer.
> - <alias>: an explicit alias defined by a SELECT item, for example:
>   date_bin(1h, time) AS hour_time
> - GROUP BY <alias>: uses the corresponding SELECT expression as the
> grouping key.
> - ORDER BY <alias>: uses the corresponding SELECT expression as the
> ordering key.
> - ASC / DESC / NULLS FIRST / NULLS LAST keep the existing ORDER BY
> semantics.
>
> Type Rules
>
> The alias itself does not introduce a new data type. Its type is the type
> of the SELECT expression it refers to.
>
> For GROUP BY alias:
> - The resolved expression must be comparable.
> - The resolved expression must not contain aggregate, window, or grouping
> functions.
> - Existing GROUP BY semantic checks should run after alias resolution.
>
> For ORDER BY alias:
> - The resolved expression must be orderable.
> - Existing ORDER BY semantic checks should run after alias resolution.
>
> Name Resolution Rules
>
> For a single unqualified identifier in GROUP BY:
>
> 1. Try to resolve it as an input column first.
> 2. If it does not resolve to an input column, try to resolve it as a SELECT
> alias.
> 3. If multiple SELECT aliases match, report an ambiguity error.
>
> For a single unqualified identifier in ORDER BY:
>
> 1. Try to resolve it as a SELECT output alias first.
> 2. If no SELECT alias matches, fall back to the existing ORDER BY name
> resolution behavior.
> 3. If multiple SELECT aliases match, report an ambiguity error.
>
> This follows the convention mentioned in the issue:
> - GROUP BY prefers the input column.
> - ORDER BY prefers the SELECT output alias.
>
> Existing Behavior To Preserve
>
> The following existing forms should keep working unchanged:
>
> GROUP BY 1
> ORDER BY 1
> GROUP BY date_bin(1h, time)
> ORDER BY AVG(temperature)
>
> Aliases will not be introduced into WHERE or HAVING in this PR.
>
> Examples
>
> Example 1: GROUP BY and ORDER BY alias
>
> SELECT date_bin(1h, time) AS hour_time, AVG(temperature) AS avg_temp
> FROM table1
> GROUP BY hour_time
> ORDER BY hour_time;
>
> This should be analyzed as if the user wrote:
>
> SELECT date_bin(1h, time) AS hour_time, AVG(temperature) AS avg_temp
> FROM table1
> GROUP BY date_bin(1h, time)
> ORDER BY date_bin(1h, time);
>
> Example 2: GROUP BY collision prefers input column
>
> If table1 has an input column named hour_time:
>
> SELECT date_bin(1h, time) AS hour_time, AVG(temperature)
> FROM table1
> GROUP BY hour_time;
>
> GROUP BY hour_time should resolve to the input column hour_time, not the
> SELECT alias.
>
> Example 3: ORDER BY collision prefers SELECT alias
>
> If table1 has an input column named hour_time:
>
> SELECT date_bin(1h, time) AS hour_time
> FROM table1
> ORDER BY hour_time;
>
> ORDER BY hour_time should resolve to the SELECT alias.
>
> Example 4: Duplicate alias ambiguity
>
> SELECT s1 AS x, s2 AS x
> FROM table1
> ORDER BY x;
>
> This should fail with a clear ambiguity error.
>
> Example 5: Aggregate alias in GROUP BY should fail
>
> SELECT AVG(temperature) AS avg_temp
> FROM table1
> GROUP BY avg_temp;
>
> This should fail because GROUP BY cannot use an aggregate expression as a
> grouping key.
>
> Implementation Plan
>
> - Collect explicit SELECT aliases during SELECT analysis, keeping the alias
> name, original expression, and position.
> - During GROUP BY analysis, resolve a single identifier against SELECT
> aliases only when it does not resolve to an input column first.
> - During ORDER BY analysis, resolve a single identifier against SELECT
> aliases before falling back to the existing ORDER BY scope behavior.
> - Run the existing type, aggregation, and expression validation after alias
> resolution.
> - Keep Lateral Column Alias support out of this PR.
>
> Test Plan
>
> I plan to add tests for:
>
> - GROUP BY alias
> - ORDER BY alias
> - GROUP BY alias with date_bin/date_bin_gapfill-style expressions
> - alias/input-column collision behavior
> - duplicate alias ambiguity
> - aggregate alias used in GROUP BY should fail
> - existing ordinal references should keep working
> - existing full-expression references should keep working
> - SELECT DISTINCT with ORDER BY alias, if applicable
>
> Could you please confirm whether this feature definition looks reasonable?
> If there are no objections, I would like to work on Part 1 of this issue.
>
> Thank
> Bryan Yang(杨易达)
>

Reply via email to