[ 
https://issues.apache.org/jira/browse/CALCITE-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17166218#comment-17166218
 ] 

Rui Wang commented on CALCITE-3272:
-----------------------------------

[~liupengcheng]

that should be controlled by emit (or trigger) strategy. There should be 
various strategies that users can specify. Emit when watermark passes the end 
of window means emit when data is believed as complete (from what Google Cloud 
Dataflow and Apache Beam's definition of data completeness). 

There should be at least three categories of emit strategies:
1. event timestamp semantic. Emit elements based on the relationship between 
watermark and event timestamp. Late data handling also falls into this category.
2. processing timestamp semantic. e.g. emit in every x 
seconds/minutes/hours/days
3. data driven semantic. e.g. emit when x elements are buffered. 

And of course we can create composed strategies from above.

In my current idea, there could be a EMIT clause to define emit strategies. 

> TUMBLE Table-valued Function
> ----------------------------
>
>                 Key: CALCITE-3272
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3272
>             Project: Calcite
>          Issue Type: Sub-task
>            Reporter: Rui Wang
>            Assignee: Rui Wang
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.22.0
>
>          Time Spent: 19h 20m
>  Remaining Estimate: 0h
>
> Define a builtin TVF: Tumble (data , timecol , dur, [ offset ])
> The return value of Tumble is a relation that includes all columns of data as 
> well as additional event time columns window_start and window_end.
> Examples of TUMBLE TVF are (from https://s.apache.org/streaming-beam-sql ):
> 8:21> SELECT * FROM Bid;
> --------------------------
> | bidtime | price | item |
> --------------------------
> | 8:07    | $2    | A    |
> | 8:11    | $3    | B    |
> | 8:05    | $4    | C    |
> | 8:09    | $5    | D    |
> | 8:13    | $1    | E    |
> | 8:17    | $6    | F    |
> --------------------------
> 8:21> SELECT *
>       FROM TABLE Tumble (
>         data    => TABLE Bid ,
>         timecol => DESCRIPTOR ( bidtime ) ,
>         dur     => INTERVAL '10' MINUTES ,
>         offset  => INTERVAL '0' MINUTES );
> ------------------------------------------
> | window_start | window_end | bidtime | price | item |
> ------------------------------------------
> | 8:00   | 8:10 | 8:07    | $2    | A    |
> | 8:10   | 8:20 | 8:11    | $3    | B    |
> | 8:00   | 8:10 | 8:05    | $4    | C    |
> | 8:00   | 8:10 | 8:09    | $5    | D    |
> | 8:10   | 8:20 | 8:13    | $1    | E    |
> | 8:10   | 8:20 | 8:17    | $6    | F    |
> ------------------------------------------
> 8:21> SELECT MAX ( window_start ) , window_end , SUM ( price )
>       FROM TABLE Tumble (
>         data    => TABLE ( Bid ) ,
>         timecol => DESCRIPTOR ( bidtime ) ,
>         dur     => INTERVAL '10 ' MINUTES )
>       GROUP BY wend;
> -------------------------
> | window_start | window_end | price |
> -------------------------
> | 8:00   | 8:10 | $11   |
> | 8:10   | 8:20 | $10   |
> -------------------------



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to