+1 We are also quite interested in these features and would love to participate and contribute.
~Haohui On Mon, Jan 23, 2017 at 7:31 AM Fabian Hueske <fhue...@gmail.com> wrote: > Hi everybody, > > it seems that currently several contributors are working on new features > for the streaming Table API / SQL around row windows (as defined in FLIP-11 > [1]) and SQL OVER-style window (FLINK-4678, FLINK-4679, FLINK-4680, > FLINK-5584). > Since these efforts overlap quite a bit I spent some time thinking about > how we can approach these features and how to avoid overlapping > contributions. > > The challenge here is the following. Some of the Table API row windows as > defined by FLIP-11 [1] are basically SQL OVER windows while other cannot be > easily expressed as such (TumbleRows for row-count intervals, SessionRows). > However, since Calcite already supports SQL OVER windows, we can reuse the > optimization logic for some of the Table API row windows. I also thought > about the semantics of the TumbleRows and SessionRows windows as defined in > FLIP-11 and came to the conclusion that these are not well defined in > FLIP-11 and should rather be defined as SlideRows windows with a special > PARTITION BY clause. > > I propose to approach SQL OVER windows and Table API row windows as > follows: > > We start with three simple cases for SQL OVER windows (not Table API yet): > > * OVER RANGE for event time > * OVER RANGE for processing time > * OVER ROW for processing time > > All cases fulfill the following restrictions: > - All aggregations in SELECT must refer to the same window. > - PARTITION BY may not contain the rowtime attribute. > - ORDER BY must be on rowtime attribute (for event time) or on a marker > function that indicates processing time. Additional sort attributes are not > supported initially. > - only "BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW" and "BETWEEN x > PRECEDING AND CURRENT ROW" are supported. > > OVER ROW for event time cannot be easily supported. With event time, we may > have late records which need to be injected into the order of records. When > a record in injected in to the order where a row-count window has already > been computed, this and all following windows will change. We could either > drop the record or sent out many retraction records. I think it is best to > not open this can of worms at this point. > > The rational for all of the above restrictions is to have first versions of > OVER windows soon. > Once we have the above cases covered we can extend and remove limitations > as follows: > > - Table API SlideRow windows (with the same restrictions as above). This > will be mostly API work since the execution part has been solved before. > - Add support for FOLLOWING (except UNBOUNDED FOLLOWING) > - Add support for different windows in SELECT. All windows must be > partitioned and ordered in the same way. > - Add support for additional ORDER BY attributes (besides time). > > As I said before, TumbleRows and SessionRows windows as in FLIP-11 are not > well defined, IMO. > They can be expressed as SlideRows windows with special partitioning > (partitioning on fixed, non-overlapping time ranges for TumbleRows, and > gap-separated, non-overlapping time ranges for SessionRows) > I would not start to work on those yet. > > I would like to close all related JIRA issues (FLINK-4678, FLINK-4679, > FLINK-4680, FLINK-5584) and restructure the development of these features > as outlined above with corresponding JIRA issues. > > What do others think? (I cc'ed the contributors assigned to the above JIRA > issues) > > Best, Fabian > > [1] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-11%3A+Table+API+Stream+Aggregations >