[
https://issues.apache.org/jira/browse/CALCITE-6939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17941722#comment-17941722
]
Mihai Budiu edited comment on CALCITE-6939 at 4/7/25 9:41 PM:
--------------------------------------------------------------
I don't know why finding documentation these days on the internet is so
difficult...
Here are some links I could find:
* A short blog article about this topic:
[https://medium.com/inthepipeline/analytical-sql-tips-series-lateral-column-alias-9d28eecdc8df]
which claims support in DuckDB, Snowflake, Databricks, and Redshift.
* Databricks documentation:
[https://www.databricks.com/blog/introducing-support-lateral-column-alias]
* DuckDB mentions this feature
[https://duckdb.org/docs/stable/sql/dialect/friendly_sql.html], but does not
seem to document how name lookup is actually performed in their pdf
documentation [https://duckdb.org/duckdb-docs.pdf]
* Snowflake: a blog post about it:
[https://sqlkover.com/cool-stuff-in-snowflake-part-4-aliasing-all-the-things/,]
but no real documentation about the semantics
* Redshift:
[https://aws.amazon.com/about-aws/whats-new/2018/08/amazon-redshift-announces-support-for-lateral-column-alias-reference/],
[https://docs.aws.amazon.com/redshift/latest/dg/r_SELECT_list.html]
RedShift and Databricks have a similar lookup rule; here is a quote from the
RedShift documentation:
{noformat}
The alias is recognized right after it is defined in the target list.
You can use an alias in other expressions defined after it in the same target
list.
The following example illustrates this.
select clicks / impressions as probability, round(100 * probability, 1) as
percentage from raw_data;
The benefit of the lateral alias reference is you don't need to repeat the
aliased
expression when building more complex expressions in the same target list.
When Amazon Redshift parses this type of reference, it just inlines the
previously
defined aliases. If there is a column with the same name defined in the FROM
clause
as the previously aliased expression, the column in the FROM clause takes
priority.
For example, in the above query if there is a column named 'probability' in
table
raw_data, the 'probability' in the second expression in the target list refers
to
that column instead of the alias name 'probability'.
{noformat}
Clearly, this is a breaking change, because it makes some otherwise illegal
queries become legal. It is also technically just syntactic sugar, since it is
not absolutely necessary. But if you have a user who needs to port a 10KLOC SQL
program from Spark SQL to a Calcite-based tool, they will appreciate this
feature, even if the Spark SQL design is not the best one. That's why it this
feature should be behind a conformance flag.
I am thinking to make the conformance flag an enum with 3 values: Unsupported
(standard behavior), LeftToRight (my previous proposal), AnyOrder (similar to
measures). The two databases that document this choose to lookup in the same
SELECT last, and that sounds to me like the right choice.
We could also add new syntax to clarify scoping, either on definition, like
MEASURE, or on lookup, but the downside is that these don't look like anything
out there, so the porting effort may be diminished, but not eliminated, using
some keyword, e.g., NEW:
* SELECT NEW x+1 to indicate that the lookup of x should be done in the local
select statement,
* SELECT expr AS NEW x to indicate that x should be visible in the same
statement
was (Author: JIRAUSER295926):
I don't know why finding documentation these days on the internet is so
difficult...
Here are some links I could find:
* A short blog article about this topic:
[https://medium.com/inthepipeline/analytical-sql-tips-series-lateral-column-alias-9d28eecdc8df]
which claims support in DuckDB, Snowflake, Databricks, and Redshift.
* Databricks documentation:
[https://www.databricks.com/blog/introducing-support-lateral-column-alias]
* DuckDB mentions this feature
[https://duckdb.org/docs/stable/sql/dialect/friendly_sql.html], but does not
seem to document how name lookup is actually performed in their pdf
documentation [https://duckdb.org/duckdb-docs.pdf]
* Snowflake: a blog post about it:
[https://sqlkover.com/cool-stuff-in-snowflake-part-4-aliasing-all-the-things/,]
but no real documentation about the semantics
* Redshift:
[https://aws.amazon.com/about-aws/whats-new/2018/08/amazon-redshift-announces-support-for-lateral-column-alias-reference/],
[https://docs.aws.amazon.com/redshift/latest/dg/r_SELECT_list.html]
RedShift and Databricks have a similar lookup rule; here is a quote from the
RedShift documentation:
{noformat}
The alias is recognized right after it is defined in the target list. You can
use an alias in other expressions defined after it in the same target list. The
following example illustrates this.
select clicks / impressions as probability, round(100 * probability, 1) as
percentage from raw_data;
The benefit of the lateral alias reference is you don't need to repeat the
aliased expression when building more complex expressions in the same target
list. When Amazon Redshift parses this type of reference, it just inlines the
previously defined aliases. If there is a column with the same name defined in
the FROM clause as the previously aliased expression, the column in the FROM
clause takes priority. For example, in the above query if there is a column
named 'probability' in table raw_data, the 'probability' in the second
expression in the target list refers to that column instead of the alias name
'probability'.
{noformat}
Clearly, this is a breaking change, because it makes some otherwise illegal
queries become legal. It is also technically just syntactic sugar, since it is
not absolutely necessary. But if you have a user who needs to port a 10KLOC SQL
program from Spark SQL to a Calcite-based tool, they will appreciate this
feature, even if the Spark SQL design is not the best one. That's why it this
feature should be behind a conformance flag.
I am thinking to make the conformance flag an enum with 3 values: Unsupported
(standard behavior), LeftToRight (my previous proposal), AnyOrder (similar to
measures). The two databases that document this choose to lookup in the same
SELECT last, and that sounds to me like the right choice.
We could also add new syntax to clarify scoping, either on definition, like
MEASURE, or on lookup, but the downside is that these don't look like anything
out there, so the porting effort may be diminished, but not eliminated, using
some keyword, e.g., NEW:
* SELECT NEW x+1 to indicate that the lookup of x should be done in the local
select statement,
* SELECT expr AS NEW x to indicate that x should be visible in the same
statement
> Add support for Lateral Column Alias
> ------------------------------------
>
> Key: CALCITE-6939
> URL: https://issues.apache.org/jira/browse/CALCITE-6939
> Project: Calcite
> Issue Type: Wish
> Components: core
> Affects Versions: 1.39.0
> Reporter: Mihai Budiu
> Priority: Minor
> Labels: pull-request-available
>
> Several SQL dialects allow a SELECT to refer to aliases introduced previously
> in the same select.
> Examples are: Spark, BigQuery, Snowflake. Other names for this feature are
> "column aliasing with immediate reference", or "column alias visibility in
> the same SELECT statement".
> This feature was already proposed and implemented by [CALCITE-5248], but that
> issue and PR contains multiple other unrelated features.
> I am also proposing adding a conformance flag `isSelectAlias` which enables
> this feature. This is strictly a validator feature.
> The semantics has to be clearly defined; we do not want to change the
> semantics of any existing query. My proposal is to allow a name to be
> resolved in the current SELECT statement only if all the other name lookup
> methods have failed.
> I also think it's a good idea to introduce new aliases from left to right, so
> a expression can only refer to aliases introduced to the left of it. This
> removes a few unpleasant corner cases too.
> Another difficult problem is the handling of nondeterministic expressions.
> When you refer to a prior expression by an alias, can/is the expression
> re-evaluated? Unfortunately I believe that the current Calcite IR makes it
> difficult to represent the program in such a way that re-evaluation is
> avoided. (But this is a problem with many other Calcite rewrite rules as
> well.) This feature is still useful when applied to deterministic expressions.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)