+1

We are also quite interested in these features and would love to
participate and contribute.

~Haohui

On Mon, Jan 23, 2017 at 7:31 AM Fabian Hueske <fhue...@gmail.com> wrote:

> Hi everybody,
>
> it seems that currently several contributors are working on new features
> for the streaming Table API / SQL around row windows (as defined in FLIP-11
> [1]) and SQL OVER-style window (FLINK-4678, FLINK-4679, FLINK-4680,
> FLINK-5584).
> Since these efforts overlap quite a bit I spent some time thinking about
> how we can approach these features and how to avoid overlapping
> contributions.
>
> The challenge here is the following. Some of the Table API row windows as
> defined by FLIP-11 [1] are basically SQL OVER windows while other cannot be
> easily expressed as such (TumbleRows for row-count intervals, SessionRows).
> However, since Calcite already supports SQL OVER windows, we can reuse the
> optimization logic for some of the Table API row windows. I also thought
> about the semantics of the TumbleRows and SessionRows windows as defined in
> FLIP-11 and came to the conclusion that these are not well defined in
> FLIP-11 and should rather be defined as SlideRows windows with a special
> PARTITION BY clause.
>
> I propose to approach SQL OVER windows and Table API row windows as
> follows:
>
> We start with three simple cases for SQL OVER windows (not Table API yet):
>
> * OVER RANGE for event time
> * OVER RANGE for processing time
> * OVER ROW for processing time
>
> All cases fulfill the following restrictions:
> - All aggregations in SELECT must refer to the same window.
> - PARTITION BY may not contain the rowtime attribute.
> - ORDER BY must be on rowtime attribute (for event time) or on a marker
> function that indicates processing time. Additional sort attributes are not
> supported initially.
> - only "BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW" and "BETWEEN x
> PRECEDING AND CURRENT ROW" are supported.
>
> OVER ROW for event time cannot be easily supported. With event time, we may
> have late records which need to be injected into the order of records. When
> a record in injected in to the order where a row-count window has already
> been computed, this and all following windows will change. We could either
> drop the record or sent out many retraction records. I think it is best to
> not open this can of worms at this point.
>
> The rational for all of the above restrictions is to have first versions of
> OVER windows soon.
> Once we have the above cases covered we can extend and remove limitations
> as follows:
>
> - Table API SlideRow windows (with the same restrictions as above). This
> will be mostly API work since the execution part has been solved before.
> - Add support for FOLLOWING (except UNBOUNDED FOLLOWING)
> - Add support for different windows in SELECT. All windows must be
> partitioned and ordered in the same way.
> - Add support for additional ORDER BY attributes (besides time).
>
> As I said before, TumbleRows and SessionRows windows as in FLIP-11 are not
> well defined, IMO.
> They can be expressed as SlideRows windows with special partitioning
> (partitioning on fixed, non-overlapping time ranges for TumbleRows, and
> gap-separated, non-overlapping time ranges for SessionRows)
> I would not start to work on those yet.
>
> I would like to close all related JIRA issues (FLINK-4678, FLINK-4679,
> FLINK-4680, FLINK-5584) and restructure the development of these features
> as outlined above with corresponding JIRA issues.
>
> What do others think? (I cc'ed the contributors assigned to the above JIRA
> issues)
>
> Best, Fabian
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-11%3A+Table+API+Stream+Aggregations
>

Reply via email to