I would regard this as two separate but related things: a new SQL syntax for 
joins, and a new relational operator. It is definitely worth keeping them 
separate; the operator will not map 1-1 to the syntax, may require its input to 
input to be sorted, and of course we would want queries to be able to use the 
operator even if they didn’t use the syntax.

The relational operator can have physical implementations in various calling 
conventions. Or even flags extending existing algorithms (e.g. add a 
‘keepAtMostOneOnLeft’ flag to EnumerableMergeJoin).

Regarding whether to represent the operator as a subclass of Join or just a 
subclass of BiRel. I recommend making it a subclass of join, but we have to 
take care that rewrite rules and metadata rules designed to apply to regular 
joins do not accidentally apply to these joins. We’ve already done that with 
semi-join, so it shouldn’t be too hard to follow those breadcrumbs.

I recently read “The Complete Story of Joins (in HyPer)”, which contains some 
other interesting and useful join variants: dependent join and mark join. We 
should consider adding these as relational operators, in the same way that we 
add asof-join.

Julian

[1] 
http://btw2017.informatik.uni-stuttgart.de/slidesandpapers/F1-10-37/paper_web.pdf

> On Apr 15, 2024, at 2:19 PM, Mihai Budiu <mbu...@gmail.com> wrote:
> 
> Hello,
> 
> Seems that this new kind of JOIN named AS OF is very useful for processing 
> time-series data. Here is some example documentation from Snowflake: 
> https://docs.snowflake.com/en/sql-reference/constructs/asof-join
> 
> The semantics is similar to a traditional join, but the result always 
> contains at most one record from the left side, with the last​ matching 
> record on the right side (where "time" is any value that can be compared for 
> inequality). This can be expressed in SQL, but it looks very cumbersome, 
> using a JOIN, a GROUP BY, and then an aggregation to keep the last value.
> 
> I haven't seen anything like that in Calcite, although Calcite does seem to 
> have support for all sorts of temporal and stream notions.
> 
> If one were to implement it, what would be the right way to do it? A subclass 
> of Join? A new type of BiRel RelNode?
> 
> Thank you,
> Mihai

Reply via email to