[ 
https://issues.apache.org/jira/browse/CALCITE-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17166580#comment-17166580
 ] 

Rui Wang commented on CALCITE-3272:
-----------------------------------

[~danny0405][~liupengcheng]

I got some prototyping but recently I am busy on my daily work so I stopped 
working on this. I might be able to pick it up again in near term.

I can see there were some different opinions on how we should implement EMIT 
strategies. At least to me, right now I am leaning to build "EMIT expression", 
e.g. EMIT WHEN current_timestamp() >= window_end, EMIT WHEN COUNT(*) >= 100, 
etc. I believe EMIT expressions are 1) easy to understand its semantic cause 
you can just imagine that an engine just apply expressions to buffered data in 
windowing function scan, and emit when expression eval are true 2) high 
extensible because it is expression.

Then build several fixed strategies as keywords and make them on top of EMIT 
expressions. For example, support "EMIT AFTER Watermark" and then translate 
this strategy to  "EMIT WHEN current_timestamp() >= window_end"  in plan.


The only uncertainty right now is how to model "watermark" as a part of SQL 
algebra. My thought is offer watermark functions. E.g. current_watermark() to 
get current watermark position.  In systems I have seen, there should be 
something like "watermark aggregator" that watermark functions can calls to. In 
Calcite it pretty much means we add a centralized watermark aggregator to track 
watermarks on each RelNode.

Finally, of course there are more details about how Calcite interfaces could be 
changed. I should draft a design doc for this.

> TUMBLE Table-valued Function
> ----------------------------
>
>                 Key: CALCITE-3272
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3272
>             Project: Calcite
>          Issue Type: Sub-task
>            Reporter: Rui Wang
>            Assignee: Rui Wang
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.22.0
>
>          Time Spent: 19h 20m
>  Remaining Estimate: 0h
>
> Define a builtin TVF: Tumble (data , timecol , dur, [ offset ])
> The return value of Tumble is a relation that includes all columns of data as 
> well as additional event time columns window_start and window_end.
> Examples of TUMBLE TVF are (from https://s.apache.org/streaming-beam-sql ):
> 8:21> SELECT * FROM Bid;
> --------------------------
> | bidtime | price | item |
> --------------------------
> | 8:07    | $2    | A    |
> | 8:11    | $3    | B    |
> | 8:05    | $4    | C    |
> | 8:09    | $5    | D    |
> | 8:13    | $1    | E    |
> | 8:17    | $6    | F    |
> --------------------------
> 8:21> SELECT *
>       FROM TABLE Tumble (
>         data    => TABLE Bid ,
>         timecol => DESCRIPTOR ( bidtime ) ,
>         dur     => INTERVAL '10' MINUTES ,
>         offset  => INTERVAL '0' MINUTES );
> ------------------------------------------
> | window_start | window_end | bidtime | price | item |
> ------------------------------------------
> | 8:00   | 8:10 | 8:07    | $2    | A    |
> | 8:10   | 8:20 | 8:11    | $3    | B    |
> | 8:00   | 8:10 | 8:05    | $4    | C    |
> | 8:00   | 8:10 | 8:09    | $5    | D    |
> | 8:10   | 8:20 | 8:13    | $1    | E    |
> | 8:10   | 8:20 | 8:17    | $6    | F    |
> ------------------------------------------
> 8:21> SELECT MAX ( window_start ) , window_end , SUM ( price )
>       FROM TABLE Tumble (
>         data    => TABLE ( Bid ) ,
>         timecol => DESCRIPTOR ( bidtime ) ,
>         dur     => INTERVAL '10 ' MINUTES )
>       GROUP BY wend;
> -------------------------
> | window_start | window_end | price |
> -------------------------
> | 8:00   | 8:10 | $11   |
> | 8:10   | 8:20 | $10   |
> -------------------------



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to