Hi, Timo ~

> We are not forced by
the standard to do it as stated in the `One SQL to Rule it all` paper

No, slide to the SQL standard is always better, i think this is a common 
routine of our Flink SQL now, without a standard, everyone can give a 
preference and the discussion would easily go too far apart.

> We can align the SQL windows more towards our regular DataStream API
windows, where you keyBy first and then apply a window operator.

I don't think current DataStream window join implement the window semantics 
correctly, it joins the data set first then windowing the LHS and RHS data 
together, actually each input should window its data set separately.

As for the "key by data set first", current window TVF appends just window 
attributes and thus it is very light-weight and orthorhombic, we can combine 
the window TVFs with additional joins, aggregations, TopN and so on.

In SQL, people usually describe the "KEY BY" with "GROUP BY" caluse, that means 
we bind strongly the window TVF and aggregate operator together which i would 
definitely vote a -1.

As for the PARTTION BY, there are specific cases for the "SESSION" window 
because a session often has a logic key there, we can extend the PARTTION BY 
syntax because it is already in the SQL standard, i'm confused why a Tumble 
window has a PARTITION key there ? What is the real use case ?

-1 for "ORDER BY" because sort on un-bounded data set does not have meanings. 
For un-bounded data set we already has the watermark to handle the 
out-of-orderness data, and for bounded data set, we can use the regular sort 
here because current table argument allows any query actually.

Best,
Danny Chan
在 2020年10月15日 +0800 PM5:16,dev@flink.apache.org,写道:
>
> Personally, I find this easier to explain to users than telling them the
> difference why a session window has SET semantic input tables and
> tumble/sliding have ROW semantic input tables.

Reply via email to