[
https://issues.apache.org/jira/browse/CALCITE-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17166580#comment-17166580
]
Rui Wang commented on CALCITE-3272:
-----------------------------------
[~danny0405][~liupengcheng]
I got some prototyping but recently I am busy on my daily work so I stopped
working on this. I might be able to pick it up again in near term.
I can see there were some different opinions on how we should implement EMIT
strategies. At least to me, right now I am leaning to build "EMIT expression",
e.g. EMIT WHEN current_timestamp() >= window_end, EMIT WHEN COUNT(*) >= 100,
etc. I believe EMIT expressions are 1) easy to understand its semantic cause
you can just imagine that an engine just apply expressions to buffered data in
windowing function scan, and emit when expression eval are true 2) high
extensible because it is expression.
Then build several fixed strategies as keywords and make them on top of EMIT
expressions. For example, support "EMIT AFTER Watermark" and then translate
this strategy to "EMIT WHEN current_timestamp() >= window_end" in plan.
The only uncertainty right now is how to model "watermark" as a part of SQL
algebra. My thought is offer watermark functions. E.g. current_watermark() to
get current watermark position. In systems I have seen, there should be
something like "watermark aggregator" that watermark functions can calls to. In
Calcite it pretty much means we add a centralized watermark aggregator to track
watermarks on each RelNode.
Finally, of course there are more details about how Calcite interfaces could be
changed. I should draft a design doc for this.
> TUMBLE Table-valued Function
> ----------------------------
>
> Key: CALCITE-3272
> URL: https://issues.apache.org/jira/browse/CALCITE-3272
> Project: Calcite
> Issue Type: Sub-task
> Reporter: Rui Wang
> Assignee: Rui Wang
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.22.0
>
> Time Spent: 19h 20m
> Remaining Estimate: 0h
>
> Define a builtin TVF: Tumble (data , timecol , dur, [ offset ])
> The return value of Tumble is a relation that includes all columns of data as
> well as additional event time columns window_start and window_end.
> Examples of TUMBLE TVF are (from https://s.apache.org/streaming-beam-sql ):
> 8:21> SELECT * FROM Bid;
> --------------------------
> | bidtime | price | item |
> --------------------------
> | 8:07 | $2 | A |
> | 8:11 | $3 | B |
> | 8:05 | $4 | C |
> | 8:09 | $5 | D |
> | 8:13 | $1 | E |
> | 8:17 | $6 | F |
> --------------------------
> 8:21> SELECT *
> FROM TABLE Tumble (
> data => TABLE Bid ,
> timecol => DESCRIPTOR ( bidtime ) ,
> dur => INTERVAL '10' MINUTES ,
> offset => INTERVAL '0' MINUTES );
> ------------------------------------------
> | window_start | window_end | bidtime | price | item |
> ------------------------------------------
> | 8:00 | 8:10 | 8:07 | $2 | A |
> | 8:10 | 8:20 | 8:11 | $3 | B |
> | 8:00 | 8:10 | 8:05 | $4 | C |
> | 8:00 | 8:10 | 8:09 | $5 | D |
> | 8:10 | 8:20 | 8:13 | $1 | E |
> | 8:10 | 8:20 | 8:17 | $6 | F |
> ------------------------------------------
> 8:21> SELECT MAX ( window_start ) , window_end , SUM ( price )
> FROM TABLE Tumble (
> data => TABLE ( Bid ) ,
> timecol => DESCRIPTOR ( bidtime ) ,
> dur => INTERVAL '10 ' MINUTES )
> GROUP BY wend;
> -------------------------
> | window_start | window_end | price |
> -------------------------
> | 8:00 | 8:10 | $11 |
> | 8:10 | 8:20 | $10 |
> -------------------------
--
This message was sent by Atlassian Jira
(v8.3.4#803005)