Re: [DISCUSSION] Custom Control Tuples

David Yan Fri, 24 Jun 2016 23:42:58 -0700

It looks like option 1 is preferred by the community. But let me elaborate
why I brought up the option of piggy backing BEGIN and END_WINDOW


Option 2 implicitly enforces that the operations related to the custom
control tuple be done at the streaming window boundary.

For most operations, it makes sense to have that enforcement. Option 1
opens the door to the possibility of sending and handling control tuples
within a window, thus imposing a challenge of ensuring idempotency. In
fact, allowing that would make idempotency extremely difficult to achieve.

David

On Fri, Jun 24, 2016 at 4:38 PM, Vlad Rozov <[email protected]> wrote:

> +1 for option 1.
>
> Thank you,
>
> Vlad
>
>
> On 6/24/16 14:35, Bright Chen wrote:
>
>> +1
>> It also can help to Shutdown the application gracefully.
>> Bright
>>
>> On Jun 24, 2016, at 1:35 PM, Siyuan Hua <[email protected]> wrote:
>>>
>>> +1
>>>
>>> I think it's good to have custom control tuple and I prefer the 1 option.
>>>
>>> Also I think we should think about couple different callbacks, that could
>>> be operator level(triggered when an operator receives an control tuple)
>>> or
>>> dag level(triggered when control tuple flow over the whole dag)
>>>
>>> Regards,
>>> Siyuan
>>>
>>>
>>>
>>>
>>> On Fri, Jun 24, 2016 at 12:42 PM, David Yan <[email protected]>
>>> wrote:
>>>
>>> My initial thinking is that the custom control tuples, just like the
>>>> existing control tuples, will only be generated from the input operators
>>>> and will be propagated downstream to all operators in the DAG. So the
>>>> NxM
>>>> partitioning scenario works just like how other control tuples work,
>>>> i.e.
>>>> the callback will not be called unless all ports have received the
>>>> control
>>>> tuple for a particular window. This creates a little bit of complication
>>>> with multiple input operators though.
>>>>
>>>> David
>>>>
>>>>
>>>> On Fri, Jun 24, 2016 at 12:03 PM, Tushar Gosavi <[email protected]
>>>> >
>>>> wrote:
>>>>
>>>> +1 for the feature
>>>>>
>>>>> I am in favor of option 1, but we may need an helper method to avoid
>>>>> compiler error on typed port, as calling port.emit(controlTuple) will
>>>>> be an error if type of control tuple and port does not match. or new
>>>>> method in outputPort object , emitControlTuple(ControlTuple).
>>>>>
>>>>> Can you give example of piggy backing tuple with current BEGIN_WINDOW
>>>>> and END_WINDOW control tuples?
>>>>>
>>>>> In case of NxM partitioning, each downstream operator will receive N
>>>>> control tuples. will it call user handler N times for each downstream
>>>>> operator or just once.
>>>>>
>>>>> Regards,
>>>>> - Tushar.
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jun 24, 2016 at 11:52 PM, David Yan <[email protected]>
>>>>>
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>>
>>>>>> I would like to propose a new feature to the Apex core engine -- the
>>>>>> support of custom control tuples. Currently, we have control tuples
>>>>>>
>>>>> such
>>>>
>>>>> as
>>>>>
>>>>>> BEGIN_WINDOW, END_WINDOW, CHECKPOINT, and so on, but we don't have the
>>>>>> support for applications to insert their own control tuples. The way
>>>>>> currently to get around this is to use data tuples and have a separate
>>>>>>
>>>>> port
>>>>>
>>>>>> for such tuples that sends tuples to all partitions of the downstream
>>>>>> operators, which is not exactly developer friendly.
>>>>>>
>>>>>> We have already seen a number of use cases that can use this feature:
>>>>>>
>>>>>> 1) Batch support: We need to tell all operators of the physical DAG
>>>>>>
>>>>> when
>>>>
>>>>> a
>>>>>
>>>>>> batch starts and ends, so the operators can do whatever that is needed
>>>>>>
>>>>> upon
>>>>>
>>>>>> the start or the end of a batch.
>>>>>>
>>>>>> 2) Watermark: To support the concepts of event time windowing, the
>>>>>> watermark control tuple is needed to tell which windows should be
>>>>>> considered late.
>>>>>>
>>>>>> 3) Changing operator properties: We do have the support of changing
>>>>>> operator properties on the fly, but with a custom control tuple, the
>>>>>> command to change operator properties can be window aligned for all
>>>>>> partitions and also across the DAG.
>>>>>>
>>>>>> 4) Recording tuples: Like changing operator properties, we do have
>>>>>> this
>>>>>> support now but only at the individual physical operator level, and
>>>>>>
>>>>> without
>>>>>
>>>>>> control of which window to record tuples for. With a custom control
>>>>>>
>>>>> tuple,
>>>>>
>>>>>> because a control tuple must belong to a window, all operators in the
>>>>>>
>>>>> DAG
>>>>
>>>>> can start (and stop) recording for the same windows.
>>>>>>
>>>>>> I can think of two options to achieve this:
>>>>>>
>>>>>> 1) new custom control tuple type that takes user's serializable
>>>>>> object.
>>>>>>
>>>>>> 2) piggy back the current BEGIN_WINDOW and END_WINDOW control tuples.
>>>>>>
>>>>>> Please provide your feedback. Thank you.
>>>>>>
>>>>>> David
>>>>>>
>>>>>
>

Re: [DISCUSSION] Custom Control Tuples

Reply via email to