Making Siyuan's diagram fixed width so we can actually see it :)
+-------------------+
+---------> | WindowedOperator | <--------+
| +--------+----------+ |
| ^ ^--------------------------------+
| | | |
| | | |
+------+--------+ +------+------+ +-------+-----+ +------+-----+
|CombineOperator| |GroupOperator| |KeyedOperator| |JoinOperator|
+---------------+ +-------------+ +------+------+ +-----+------+
+---------^ ^ ^
| | |
+--------+---+ +-----+----+ +----+----+
|KeyedCombine| |KeyedGroup| | CoGroup |
+------------+ +----------+ +---------+
On Wed, May 11, 2016 at 11:47 AM, Siyuan Hua <[email protected]> wrote:
> Hi guys,
>
> I just create a ticket for windowed operators. It is the prerequisite for
> High-level API and Beam integration.
>
> As per our recent several discussions in the community. A group of Windowed
> Operators that delivers the window semantic follows the google Data Flow
> model(https://cloud.google.com/dataflow/) is very important.
> The operators should be designed and implemented in a way for
> High-level API
> Beam translation
> Easy to use with other popular operator
>
> Hierarchy of the operators,
> The windowed operators should cover all possible transformations that
> require window, and batch processing is also considered as special window
> called global window
>
> +-------------------+
> +---------> | WindowedOperator | <--------+
> | +--------+----------+ |
> | ^ ^--------------------------------+
> | | | |
> | | | |
> +------+--------+ +------+------+ +-------+-----+ +------+-----+
> |CombineOperator| |GroupOperator| |KeyedOperator| |JoinOperator|
> +---------------+ +-------------+ +------+------+ +-----+------+
> +---------^ ^ ^
> | | |
> +--------+---+ +-----+----+ +----+----+
> |KeyedCombine| |KeyedGroup| | CoGroup |
> +------------+ +----------+ +---------+
>
>
> Combine operation includes all operations that combine all tuples in one
> window into one or small number of tuples, Group operation group all tuples
> in one window, Join and CoGroup are used to join and group tuples from
> different inputs.
>
> Components:
> Window Component, it includes configuration, window state that should be
> checkpointed, etc. It should support NonMergibleWindow(fixed or slide)
> MergibleWindow(Session)
>
> Trigger:
> It should support early trigger, late trigger. It should support
> customizable trigger behaviour
>
> Outside components:
> Watermark generator, can be plugged into input source to generate watermark
> Tuple util components, The operator should be only working on Tuples with
> schema. It can handle either predefined tuple type or give a declarative
> API to describe the user defined tuple class
>
> Most component API should be reused in High-Level API
>
> This is the umbrella ticket, separate tickets would be created for
> different components and operators respectively
>