Hi, Timo ~ > We are not forced by the standard to do it as stated in the `One SQL to Rule it all` paper
No, slide to the SQL standard is always better, i think this is a common routine of our Flink SQL now, without a standard, everyone can give a preference and the discussion would easily go too far apart. > We can align the SQL windows more towards our regular DataStream API windows, where you keyBy first and then apply a window operator. I don't think current DataStream window join implement the window semantics correctly, it joins the data set first then windowing the LHS and RHS data together, actually each input should window its data set separately. As for the "key by data set first", current window TVF appends just window attributes and thus it is very light-weight and orthorhombic, we can combine the window TVFs with additional joins, aggregations, TopN and so on. In SQL, people usually describe the "KEY BY" with "GROUP BY" caluse, that means we bind strongly the window TVF and aggregate operator together which i would definitely vote a -1. As for the PARTTION BY, there are specific cases for the "SESSION" window because a session often has a logic key there, we can extend the PARTTION BY syntax because it is already in the SQL standard, i'm confused why a Tumble window has a PARTITION key there ? What is the real use case ? -1 for "ORDER BY" because sort on un-bounded data set does not have meanings. For un-bounded data set we already has the watermark to handle the out-of-orderness data, and for bounded data set, we can use the regular sort here because current table argument allows any query actually. Best, Danny Chan 在 2020年10月15日 +0800 PM5:16,dev@flink.apache.org,写道: > > Personally, I find this easier to explain to users than telling them the > difference why a session window has SET semantic input tables and > tumble/sliding have ROW semantic input tables.