[jira] [Comment Edited] (CALCITE-6372) Support ASOF joins

Mihai Budiu (Jira) Wed, 31 Jul 2024 09:21:03 -0700


    [ 
https://issues.apache.org/jira/browse/CALCITE-6372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17869965#comment-17869965
 ]


Mihai Budiu edited comment on CALCITE-6372 at 7/31/24 4:20 PM:
---------------------------------------------------------------

I read the DuckDB documentation for ASOF. In some respects it is a better 
design. But I am not sure that their documentation actually documents the 
*restrictions* on the ASOF join. So their syntax may only look better, but it 
may be restricted by the implementation.

For example, Snowflake forces the ASOF join condition to be a combination of 
equi-join plus ONE comparison between two columns in the two tables. This form 
of ASOF join is actually what applications need, and what can be optimized in 
an implementation. The DuckDB documentation says nothing about the condition 
structure, but I suspect will require the same constraints to be efficient. If 
I am right, then the only real difference is that the DuckDB syntax infers the 
comparison from the syntax of the Boolean expression, whereas Snowflake has it 
explicit in the MATCH_CONDITION.

Another thing which I don't like in the DuckDB design is the ASOF JOIN with 
USING. The current PR does not support USING yet, but I think that the 
Snowflake design is better. In Duck DB they say that the last column in the 
USING list is the one compared using a default comparison. I find this very 
confusing for users - they write syntactically an equi-join, but get something 
different.

Another thing which I think is wrong in the DuckDB design is the comparison of 
NULL values (that is not documented, but based on testing). DuckDB compares 
NULLs in joins higher than other values. But this is inconsistent with how 
joins use NULLs, which can never match. I think it's a good rule that the 
result of an ASOF join is always a subset of the result of the same SQL 
statement with "ASOF" removed (i.e., a standard join). This is not true for 
DuckDB's implementation.

One thing I have taken from the DuckDB design is making the ASOF join an INNER 
join. In Snowflake it's always an outer join, but I don't think there is any 
good reason for that. So in the current PR you can write both ASOF and LEFT 
ASOF. (There is no RIGHT ASOF, the left table is always special.)


was (Author: JIRAUSER295926):
I read the DuckDB documentation for ASOF. In some respects it is a better 
design. But I am not sure that their documentation actually documents the 
*restrictions* on the ASOF join. So their syntax may only look better, but it 
may be restricted by the implementation.

For example, Snowflake forces the ASOF join condition to be a combination of 
equi-join plus ONE comparison between two columns in the two tables. This form 
of ASOF join is actually what applications need, and what can be optimized in 
an implementation. The DuckDB documentation says nothing about the condition 
structure, but I suspect will require the same constraints to be efficient. If 
I am right, then the only real difference is that the DuckDB syntax infers the 
comparison from the syntax of the Boolean expression, whereas Snowflake has it 
explicit in the MATCH_CONDITION.

Another thing which I don't like in the DuckDB design is the ASOF JOIN with 
USING. The current PR does not support USING yet, but I think that the 
Snowflake design is better. In Duck DB they say that the last column in the 
USING list is the one compared using a default comparison. I find this very 
confusing for users - they write syntactically an equi-join, but get something 
different.

Another thing which I think is wrong in the DuckDB design is the comparison of 
NULL values (that is not documented, but based on testing). DuckDB compares 
NULLs in joins higher than other values. But this is inconsistent with how 
joins use NULLs, which can never match. I think it's a good rule that the 
result of an ASOF join is always a subset of the result of the same SQL 
statement with "ASOF" remoted (i.e., a standard join). This is not true for 
DuckDB's implementation.

One thing I have taken from the DuckDB design is making the ASOF join an INNER 
join. In Snowflake it's always an outer join, but I don't think there is any 
good reason for that. So in the current PR you can write both ASOF and LEFT 
ASOF. (There is no RIGHT ASOF, the left table is always special.)

> Support ASOF joins
> ------------------
>
>                 Key: CALCITE-6372
>                 URL: https://issues.apache.org/jira/browse/CALCITE-6372
>             Project: Calcite
>          Issue Type: New Feature
>          Components: core
>    Affects Versions: 1.36.0
>            Reporter: Mihai Budiu
>            Assignee: Mihai Budiu
>            Priority: Minor
>              Labels: pull-request-available
>
> Seems that this new kind of JOIN named AS OF is very useful for processing 
> time-series data. Here is some example documentation from Snowflake: 
> https://docs.snowflake.com/en/sql-reference/constructs/asof-join
> The semantics is similar to a traditional join, but the result always 
> contains at most one record from the left side, with the last matching 
> record on the right side (where "time" is any value that can be compared for 
> inequality). This can be expressed in SQL, but it looks very cumbersome, 
> using a JOIN, a GROUP BY, and then an aggregation to keep the last value.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (CALCITE-6372) Support ASOF joins

Reply via email to