[
https://issues.apache.org/jira/browse/CALCITE-6372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17869965#comment-17869965
]
Mihai Budiu edited comment on CALCITE-6372 at 7/31/24 4:20 PM:
---------------------------------------------------------------
I read the DuckDB documentation for ASOF. In some respects it is a better
design. But I am not sure that their documentation actually documents the
*restrictions* on the ASOF join. So their syntax may only look better, but it
may be restricted by the implementation.
For example, Snowflake forces the ASOF join condition to be a combination of
equi-join plus ONE comparison between two columns in the two tables. This form
of ASOF join is actually what applications need, and what can be optimized in
an implementation. The DuckDB documentation says nothing about the condition
structure, but I suspect will require the same constraints to be efficient. If
I am right, then the only real difference is that the DuckDB syntax infers the
comparison from the syntax of the Boolean expression, whereas Snowflake has it
explicit in the MATCH_CONDITION.
Another thing which I don't like in the DuckDB design is the ASOF JOIN with
USING. The current PR does not support USING yet, but I think that the
Snowflake design is better. In Duck DB they say that the last column in the
USING list is the one compared using a default comparison. I find this very
confusing for users - they write syntactically an equi-join, but get something
different.
Another thing which I think is wrong in the DuckDB design is the comparison of
NULL values (that is not documented, but based on testing). DuckDB compares
NULLs in joins higher than other values. But this is inconsistent with how
joins use NULLs, which can never match. I think it's a good rule that the
result of an ASOF join is always a subset of the result of the same SQL
statement with "ASOF" removed (i.e., a standard join). This is not true for
DuckDB's implementation.
One thing I have taken from the DuckDB design is making the ASOF join an INNER
join. In Snowflake it's always an outer join, but I don't think there is any
good reason for that. So in the current PR you can write both ASOF and LEFT
ASOF. (There is no RIGHT ASOF, the left table is always special.)
was (Author: JIRAUSER295926):
I read the DuckDB documentation for ASOF. In some respects it is a better
design. But I am not sure that their documentation actually documents the
*restrictions* on the ASOF join. So their syntax may only look better, but it
may be restricted by the implementation.
For example, Snowflake forces the ASOF join condition to be a combination of
equi-join plus ONE comparison between two columns in the two tables. This form
of ASOF join is actually what applications need, and what can be optimized in
an implementation. The DuckDB documentation says nothing about the condition
structure, but I suspect will require the same constraints to be efficient. If
I am right, then the only real difference is that the DuckDB syntax infers the
comparison from the syntax of the Boolean expression, whereas Snowflake has it
explicit in the MATCH_CONDITION.
Another thing which I don't like in the DuckDB design is the ASOF JOIN with
USING. The current PR does not support USING yet, but I think that the
Snowflake design is better. In Duck DB they say that the last column in the
USING list is the one compared using a default comparison. I find this very
confusing for users - they write syntactically an equi-join, but get something
different.
Another thing which I think is wrong in the DuckDB design is the comparison of
NULL values (that is not documented, but based on testing). DuckDB compares
NULLs in joins higher than other values. But this is inconsistent with how
joins use NULLs, which can never match. I think it's a good rule that the
result of an ASOF join is always a subset of the result of the same SQL
statement with "ASOF" remoted (i.e., a standard join). This is not true for
DuckDB's implementation.
One thing I have taken from the DuckDB design is making the ASOF join an INNER
join. In Snowflake it's always an outer join, but I don't think there is any
good reason for that. So in the current PR you can write both ASOF and LEFT
ASOF. (There is no RIGHT ASOF, the left table is always special.)
> Support ASOF joins
> ------------------
>
> Key: CALCITE-6372
> URL: https://issues.apache.org/jira/browse/CALCITE-6372
> Project: Calcite
> Issue Type: New Feature
> Components: core
> Affects Versions: 1.36.0
> Reporter: Mihai Budiu
> Assignee: Mihai Budiu
> Priority: Minor
> Labels: pull-request-available
>
> Seems that this new kind of JOIN named AS OF is very useful for processing
> time-series data. Here is some example documentation from Snowflake:
> https://docs.snowflake.com/en/sql-reference/constructs/asof-join
> The semantics is similar to a traditional join, but the result always
> contains at most one record from the left side, with the last​ matching
> record on the right side (where "time" is any value that can be compared for
> inequality). This can be expressed in SQL, but it looks very cumbersome,
> using a JOIN, a GROUP BY, and then an aggregation to keep the last value.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)