I think this could solve the dynamic property change via StatsListener as well
Thanks - Gaurav > On Sep 28, 2015, at 11:09 AM, Gaurav Gupta <[email protected]> wrote: > > Pramod, > > Here is what I was thinking that currently the property value change happens > at the boundary windows..Stram sends these Operator Requests to individual > instances. If there are multiple instances of same operator and there is a > property change request on this operator, send the Operator Request change to > the instance that is farthest and wait for other instances to come to that > window id before sending the Operator Request to them.. By this you don’t > need additional special tuple? > > Does it make sense? > > Thanks > - Gaurav > >> On Sep 28, 2015, at 10:43 AM, Pramod Immaneni <[email protected] >> <mailto:[email protected]>> wrote: >> >> If OperatorRequest is out-of-band to dataflow which I think it is then that >> would most probably not be the mechanism to relay property change. We would >> possibly expose this proposed property change in an API that could be used >> by StatsListener. >> >> On Mon, Sep 28, 2015 at 10:40 AM, Gaurav Gupta <[email protected] >> <mailto:[email protected]>> >> wrote: >> >>> Pramod, >>> >>> How would dynamic property change using OperatorRequest as part of >>> StatsListener work with new approach? >>> >>> Thanks >>> - Gaurav >>> >>>> On Sep 28, 2015, at 10:30 AM, Pramod Immaneni <[email protected] >>>> <mailto:[email protected]>> >>> wrote: >>>> >>>> An optimization that can be done is the below steps are done only when >>>> there only when there are more than one input operator but in case of a >>>> single input operator case which is more common the property change tuple >>>> can be inserted at the next possible window without having to temporarily >>>> pause the flow. >>>> >>>> On Mon, Sep 28, 2015 at 10:27 AM, Timothy Farkas <[email protected] >>>> <mailto:[email protected]>> >>>> wrote: >>>> >>>>> Furthermore this approach is not limited to DAGs with a single input >>>>> operator. In the case where a DAG has multiple input operators property >>>>> changes can be set within the same window across all input operators by >>>>> enforcing some synchronization at the input operator level when setting >>> the >>>>> property. This synchronization would look like the following: >>>>> >>>>> 1. When receiving a property change request, ask all input operators >>> to >>>>> stop and send their current window. >>>>> 2. Take the max window + 1 (not technically correct but you get the >>>>> idea) >>>>> 3. Send the property change request to all the input operators and >>> tell >>>>> them to apply the change at the maximum window id + 1. >>>>> 4. Resume the input operators. >>>>> >>>>> This ensures that the change is applied at the same window Id and also >>>>> ensures that the change is applied at a window ID that the input >>> operator >>>>> had never played before. Therefore property changes will not interfere >>> with >>>>> the idempotence of operators. >>>>> >>>>> >>>>> On Mon, Sep 28, 2015 at 9:17 AM, Pramod Immaneni < >>> [email protected] <mailto:[email protected]>> >>>>> wrote: >>>>> >>>>>> Apex support modification of operator properties at runtime but the >>>>>> current implemenations has the following shortcomings. >>>>>> >>>>>> 1. Property is not set across all partitions on the same window as >>>>>> individual partitions can be on different windows when property change >>> is >>>>>> initiated from client resulting in inconsistency of data for those >>> windows. >>>>>> I am being generous using the word inconsistent. >>>>>> 2. Sometimes properties need to be set on more than one logical >>> operators >>>>>> at the same time to achieve the change the user is seeking. Today they >>> will >>>>>> be two separate changes happening on two different windows again >>> resulting >>>>>> in inconsistent data for some windows. These would need to happen as a >>>>>> single transaction. >>>>>> 3. If there is an operator failure before a committed checkpoint after >>> an >>>>>> operator property is dynamically changed the operator will restart >>> with the >>>>>> old property and the change will not be re-applied. >>>>>> >>>>>> Tim and myself did some brainstorming and we have a proposal to >>> overcome >>>>>> these shortcomings. The main problem in all the above cases is that the >>>>>> property changes are happening out-of-band of data flow and hence >>>>>> independent of windowing. The proposal is to bring the property change >>>>>> request into the in-band dataflow so that they are handled consistently >>>>>> with windowing and handled distributively. >>>>>> >>>>>> The idea is to inject a special property change tuple containing the >>>>>> property changes and the identification information of the operator's >>> they >>>>>> affect into the dataflow at the input operator. The tuple will be >>> injected >>>>>> at window boundary after end window and before begin window and as this >>>>>> tuple flows through the DAG the intended operators properties will be >>>>>> modifed. They will all be modified consistently at the same window. The >>>>>> tuple can contain more than one property changes for more than one >>> logical >>>>>> operators and the change will be applied consistently to the different >>>>>> logical operators at the same window. In case of failure the replay of >>>>>> tuples will ensure that the property change gets reapplied at the >>> correct >>>>>> window. >>>>>> >>>>>> Please give your feedback and input on what you think about this >>> proposal. >>>>>> >>>>>> Thanks >>>>>> >>>>> >>>>> >>> >>> >
