I think this could solve the dynamic property change via StatsListener as well

Thanks
- Gaurav

> On Sep 28, 2015, at 11:09 AM, Gaurav Gupta <[email protected]> wrote:
> 
> Pramod,
> 
> Here is what I was thinking that currently the property value change happens 
> at the boundary windows..Stram sends these Operator Requests to individual 
> instances.  If there are multiple instances of same operator and there is a 
> property change request on this operator, send the Operator Request change to 
> the instance that is farthest and wait for other instances to come to that 
> window id before sending the Operator Request to them.. By this you don’t 
> need additional special tuple?
> 
> Does it make sense?
> 
> Thanks
> - Gaurav
> 
>> On Sep 28, 2015, at 10:43 AM, Pramod Immaneni <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> If OperatorRequest is out-of-band to dataflow which I think it is then that
>> would most probably not be the mechanism to relay property change. We would
>> possibly expose this proposed property change in an API that could be used
>> by StatsListener.
>> 
>> On Mon, Sep 28, 2015 at 10:40 AM, Gaurav Gupta <[email protected] 
>> <mailto:[email protected]>>
>> wrote:
>> 
>>> Pramod,
>>> 
>>> How would dynamic property change using OperatorRequest as part of
>>> StatsListener work with new approach?
>>> 
>>> Thanks
>>> - Gaurav
>>> 
>>>> On Sep 28, 2015, at 10:30 AM, Pramod Immaneni <[email protected] 
>>>> <mailto:[email protected]>>
>>> wrote:
>>>> 
>>>> An optimization that can be done is the below steps are done only when
>>>> there only when there are more than one input operator but in case of a
>>>> single input operator case which is more common the property change tuple
>>>> can be inserted at the next possible window without having to temporarily
>>>> pause the flow.
>>>> 
>>>> On Mon, Sep 28, 2015 at 10:27 AM, Timothy Farkas <[email protected] 
>>>> <mailto:[email protected]>>
>>>> wrote:
>>>> 
>>>>> Furthermore this approach is not limited to DAGs with a single input
>>>>> operator. In the case where a DAG has multiple input operators property
>>>>> changes can be set within the same window across all input operators by
>>>>> enforcing some synchronization at the input operator level when setting
>>> the
>>>>> property. This synchronization would look like the following:
>>>>> 
>>>>>  1. When receiving a property change request, ask all input operators
>>> to
>>>>> stop and send their current window.
>>>>>  2. Take the max window + 1 (not technically correct but you get the
>>>>> idea)
>>>>>  3. Send the property change request to all the input operators and
>>> tell
>>>>> them to apply the change at the maximum window id + 1.
>>>>>  4. Resume the input operators.
>>>>> 
>>>>> This ensures that the change is applied at the same window Id and also
>>>>> ensures that the change is applied at a window ID that the input
>>> operator
>>>>> had never played before. Therefore property changes will not interfere
>>> with
>>>>> the idempotence of operators.
>>>>> 
>>>>> 
>>>>> On Mon, Sep 28, 2015 at 9:17 AM, Pramod Immaneni <
>>> [email protected] <mailto:[email protected]>>
>>>>> wrote:
>>>>> 
>>>>>> Apex support modification of operator properties at runtime but the
>>>>>> current implemenations has the following shortcomings.
>>>>>> 
>>>>>> 1. Property is not set across all partitions on the same window as
>>>>>> individual partitions can be on different windows when property change
>>> is
>>>>>> initiated from client resulting in inconsistency of data for those
>>> windows.
>>>>>> I am being generous using the word inconsistent.
>>>>>> 2. Sometimes properties need to be set on more than one logical
>>> operators
>>>>>> at the same time to achieve the change the user is seeking. Today they
>>> will
>>>>>> be two separate changes happening on two different windows again
>>> resulting
>>>>>> in inconsistent data for some windows. These would need to happen as a
>>>>>> single transaction.
>>>>>> 3. If there is an operator failure before a committed checkpoint after
>>> an
>>>>>> operator property is dynamically changed the operator will restart
>>> with the
>>>>>> old property and the change will not be re-applied.
>>>>>> 
>>>>>> Tim and myself did some brainstorming and we have a proposal to
>>> overcome
>>>>>> these shortcomings. The main problem in all the above cases is that the
>>>>>> property changes are happening out-of-band of data flow and hence
>>>>>> independent of windowing. The proposal is to bring the property change
>>>>>> request into the in-band dataflow so that they are handled consistently
>>>>>> with windowing and handled distributively.
>>>>>> 
>>>>>> The idea is to inject a special property change tuple containing the
>>>>>> property changes and the identification information of the operator's
>>> they
>>>>>> affect into the dataflow at the input operator. The tuple will be
>>> injected
>>>>>> at window boundary after end window and before begin window and as this
>>>>>> tuple flows through the DAG the intended operators properties will be
>>>>>> modifed. They will all be modified consistently at the same window. The
>>>>>> tuple can contain more than one property changes for more than one
>>> logical
>>>>>> operators and the change will be applied consistently to the different
>>>>>> logical operators at the same window. In case of failure the replay of
>>>>>> tuples will ensure that the property change gets reapplied at the
>>> correct
>>>>>> window.
>>>>>> 
>>>>>> Please give your feedback and input on what you think about this
>>> proposal.
>>>>>> 
>>>>>> Thanks
>>>>>> 
>>>>> 
>>>>> 
>>> 
>>> 
> 

Reply via email to