[
https://issues.apache.org/jira/browse/CALCITE-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17166836#comment-17166836
]
Danny Chen commented on CALCITE-3272:
-------------------------------------
bq. The only uncertainty right now is how to model "watermark" as a part of SQL
algebra. My thought is offer watermark functions. E.g. current_watermark() to
get current watermark position. In systems I have seen, there should be
something like "watermark aggregator" that watermark functions can calls to. In
Calcite it pretty much means we add a centralized watermark aggregator to track
watermarks on each RelNode.
The benefit of it is that we can describe some behaviors of watermark using the
SQL syntax, Apache Flink now "hidden" the watermark as a framework interval
events. The operators do not need to care about the watermarks if they are not
sensitive to the emit strategy.
I'm not sure if adding watermark to RelNodes is a good idea, first of all, only
the operators that buffer the data need emit strategy, such as the group window.
> TUMBLE Table-valued Function
> ----------------------------
>
> Key: CALCITE-3272
> URL: https://issues.apache.org/jira/browse/CALCITE-3272
> Project: Calcite
> Issue Type: Sub-task
> Reporter: Rui Wang
> Assignee: Rui Wang
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.22.0
>
> Time Spent: 19h 20m
> Remaining Estimate: 0h
>
> Define a builtin TVF: Tumble (data , timecol , dur, [ offset ])
> The return value of Tumble is a relation that includes all columns of data as
> well as additional event time columns window_start and window_end.
> Examples of TUMBLE TVF are (from https://s.apache.org/streaming-beam-sql ):
> 8:21> SELECT * FROM Bid;
> --------------------------
> | bidtime | price | item |
> --------------------------
> | 8:07 | $2 | A |
> | 8:11 | $3 | B |
> | 8:05 | $4 | C |
> | 8:09 | $5 | D |
> | 8:13 | $1 | E |
> | 8:17 | $6 | F |
> --------------------------
> 8:21> SELECT *
> FROM TABLE Tumble (
> data => TABLE Bid ,
> timecol => DESCRIPTOR ( bidtime ) ,
> dur => INTERVAL '10' MINUTES ,
> offset => INTERVAL '0' MINUTES );
> ------------------------------------------
> | window_start | window_end | bidtime | price | item |
> ------------------------------------------
> | 8:00 | 8:10 | 8:07 | $2 | A |
> | 8:10 | 8:20 | 8:11 | $3 | B |
> | 8:00 | 8:10 | 8:05 | $4 | C |
> | 8:00 | 8:10 | 8:09 | $5 | D |
> | 8:10 | 8:20 | 8:13 | $1 | E |
> | 8:10 | 8:20 | 8:17 | $6 | F |
> ------------------------------------------
> 8:21> SELECT MAX ( window_start ) , window_end , SUM ( price )
> FROM TABLE Tumble (
> data => TABLE ( Bid ) ,
> timecol => DESCRIPTOR ( bidtime ) ,
> dur => INTERVAL '10 ' MINUTES )
> GROUP BY wend;
> -------------------------
> | window_start | window_end | price |
> -------------------------
> | 8:00 | 8:10 | $11 |
> | 8:10 | 8:20 | $10 |
> -------------------------
--
This message was sent by Atlassian Jira
(v8.3.4#803005)