inactive streams

Thomas Weise Thu, 06 Apr 2017 09:37:33 -0700

You would need to provide more specifics of the use case you are thinking
to address to make this a meaningful discussion.


An example for APEXCORE-408 (based on real batch use case): I have two
stages, first stage produces a set of files that second stage needs as
input. Stage 1 operators to be released and stage 2 operators deployed when
stage 2 starts. These can be independent operators, they don't need to be
connected through a stream.

Thomas


On Thu, Apr 6, 2017 at 9:21 AM, Vlad Rozov <[email protected]> wrote:

> It is not about a use case difference. My proposal and APEXCORE-408
> address the same use case - how to re-allocate resources for batch
> applications or applications where processing happens in stages. The
> difference between APEXCORE-408 and the proposal is shift in complexity
> from application logic to the platform. IMO, supporting batch applications
> using APEXCORE-408 will require more coding on the application side.
>
> Thank you,
>
> Vlad
>
>
> On 4/5/17 21:57, Thomas Weise wrote:
>
>> I think this needs more input on a use case level. The ability to
>> dynamically alter the DAG internally will also address the resource
>> allocation for operators:
>>
>> https://issues.apache.org/jira/browse/APEXCORE-408
>>
>> It can be used to implement stages of a batch pipeline and is very
>> flexible
>> in general. Considering the likely implementation complexity for the
>> proposed feature I would like to understand what benefits it provides to
>> the user (use cases that cannot be addressed otherwise)?
>>
>> Thanks,
>> Thomas
>>
>>
>>
>> On Sat, Apr 1, 2017 at 12:23 PM, Vlad Rozov <[email protected]>
>> wrote:
>>
>> Correct, a statefull downstream operator can only be undeployed at a
>>> checkpoint window after it consumes all data emitted by upstream operator
>>> on the closed port.
>>>
>>> It will be necessary to distinguish between closed port and inactive
>>> stream. After port is closed, stream may still be active and after port
>>> is
>>> open, stream may still be inactive (not yet ready).
>>>
>>> The more contributors participate in the discussion and implementation,
>>> the more solid the feature will be.
>>>
>>> Thank you,
>>> Vlad
>>>
>>> Отправлено с iPhone
>>>
>>> On Apr 1, 2017, at 11:03, Pramod Immaneni <[email protected]>
>>>>
>>> wrote:
>>>
>>>> Generally a good idea. Care should be taken around fault tolerance and
>>>> idempotency. Close stream would need to stop accepting new data but
>>>> still
>>>> can't actually close all the streams and un-deploy operators till
>>>> committed. Idempotency might require the close stream to take effect at
>>>>
>>> the
>>>
>>>> end of the window. What would it then mean for re-opening streams within
>>>>
>>> a
>>>
>>>> window? Also, looks like a larger undertaking, as Ram suggested would be
>>>> good to understand the use cases and I also suggest that multiple folks
>>>> participate in the implementation effort to ensure that we are able to
>>>> address all the scenarios and minimize chances of regression in existing
>>>> behavior.
>>>>
>>>> Thanks
>>>>
>>>> On Sat, Apr 1, 2017 at 8:12 AM, Vlad Rozov <[email protected]>
>>>>>
>>>> wrote:
>>>
>>>> All,
>>>>>
>>>>> Currently Apex assumes that an operator can emit on any defined output
>>>>> port and all streams defined by a DAG are active. I'd like to propose
>>>>> an
>>>>> ability for an operator to open and close output ports. By default all
>>>>> ports defined by an operator will be open. In the case an operator for
>>>>>
>>>> any
>>>
>>>> reason decides that it will not emit tuples on the output port, it may
>>>>> close it. This will make the stream inactive and the application master
>>>>>
>>>> may
>>>
>>>> undeploy the downstream (for that input stream) operators. If this
>>>>>
>>>> leads to
>>>
>>>> containers that don't have any active operators, those containers may be
>>>>> undeployed as well leading to better cluster resource utilization and
>>>>> better Apex elasticity. Later, the operator may be in a state where it
>>>>> needs to emit tuples on the closed port. In this case, it needs to
>>>>>
>>>> re-open
>>>
>>>> the port and wait till the stream becomes active again before emitting
>>>>> tuples on that port. Making inactive stream active again, requires the
>>>>> application master to re-allocate containers and re-deploy the
>>>>>
>>>> downstream
>>>
>>>> operators.
>>>>>
>>>>> It should be also possible for an application designer to mark streams
>>>>>
>>>> as
>>>
>>>> inactive when an application starts. This will allow the application
>>>>>
>>>> master
>>>
>>>> avoid reserving all containers when the application starts. Later, the
>>>>>
>>>> port
>>>
>>>> can be open and inactive stream become active.
>>>>>
>>>>> Thank you,
>>>>>
>>>>> Vlad
>>>>>
>>>>>
>>>>>
>

Re: open/close ports and active/inactive streams

Reply via email to