The operators could have moved to window beyond the window you are targetting by the time the property change request is received and processed. What happens when there are different logical operators whose properties need to be changed together. How would you handle replay on recovery. All this would mean you would start managing window state of operators in stram which is not needed.
On Mon, Sep 28, 2015 at 11:09 AM, Gaurav Gupta <[email protected]> wrote: > Pramod, > > Here is what I was thinking that currently the property value change > happens at the boundary windows..Stram sends these Operator Requests to > individual instances. If there are multiple instances of same operator and > there is a property change request on this operator, send the Operator > Request change to the instance that is farthest and wait for other > instances to come to that window id before sending the Operator Request to > them.. By this you don’t need additional special tuple? > > Does it make sense? > > Thanks > - Gaurav > > > On Sep 28, 2015, at 10:43 AM, Pramod Immaneni <[email protected]> > wrote: > > > > If OperatorRequest is out-of-band to dataflow which I think it is then > that > > would most probably not be the mechanism to relay property change. We > would > > possibly expose this proposed property change in an API that could be > used > > by StatsListener. > > > > On Mon, Sep 28, 2015 at 10:40 AM, Gaurav Gupta <[email protected]> > > wrote: > > > >> Pramod, > >> > >> How would dynamic property change using OperatorRequest as part of > >> StatsListener work with new approach? > >> > >> Thanks > >> - Gaurav > >> > >>> On Sep 28, 2015, at 10:30 AM, Pramod Immaneni <[email protected]> > >> wrote: > >>> > >>> An optimization that can be done is the below steps are done only when > >>> there only when there are more than one input operator but in case of a > >>> single input operator case which is more common the property change > tuple > >>> can be inserted at the next possible window without having to > temporarily > >>> pause the flow. > >>> > >>> On Mon, Sep 28, 2015 at 10:27 AM, Timothy Farkas <[email protected]> > >>> wrote: > >>> > >>>> Furthermore this approach is not limited to DAGs with a single input > >>>> operator. In the case where a DAG has multiple input operators > property > >>>> changes can be set within the same window across all input operators > by > >>>> enforcing some synchronization at the input operator level when > setting > >> the > >>>> property. This synchronization would look like the following: > >>>> > >>>> 1. When receiving a property change request, ask all input operators > >> to > >>>> stop and send their current window. > >>>> 2. Take the max window + 1 (not technically correct but you get the > >>>> idea) > >>>> 3. Send the property change request to all the input operators and > >> tell > >>>> them to apply the change at the maximum window id + 1. > >>>> 4. Resume the input operators. > >>>> > >>>> This ensures that the change is applied at the same window Id and also > >>>> ensures that the change is applied at a window ID that the input > >> operator > >>>> had never played before. Therefore property changes will not interfere > >> with > >>>> the idempotence of operators. > >>>> > >>>> > >>>> On Mon, Sep 28, 2015 at 9:17 AM, Pramod Immaneni < > >> [email protected]> > >>>> wrote: > >>>> > >>>>> Apex support modification of operator properties at runtime but the > >>>>> current implemenations has the following shortcomings. > >>>>> > >>>>> 1. Property is not set across all partitions on the same window as > >>>>> individual partitions can be on different windows when property > change > >> is > >>>>> initiated from client resulting in inconsistency of data for those > >> windows. > >>>>> I am being generous using the word inconsistent. > >>>>> 2. Sometimes properties need to be set on more than one logical > >> operators > >>>>> at the same time to achieve the change the user is seeking. Today > they > >> will > >>>>> be two separate changes happening on two different windows again > >> resulting > >>>>> in inconsistent data for some windows. These would need to happen as > a > >>>>> single transaction. > >>>>> 3. If there is an operator failure before a committed checkpoint > after > >> an > >>>>> operator property is dynamically changed the operator will restart > >> with the > >>>>> old property and the change will not be re-applied. > >>>>> > >>>>> Tim and myself did some brainstorming and we have a proposal to > >> overcome > >>>>> these shortcomings. The main problem in all the above cases is that > the > >>>>> property changes are happening out-of-band of data flow and hence > >>>>> independent of windowing. The proposal is to bring the property > change > >>>>> request into the in-band dataflow so that they are handled > consistently > >>>>> with windowing and handled distributively. > >>>>> > >>>>> The idea is to inject a special property change tuple containing the > >>>>> property changes and the identification information of the operator's > >> they > >>>>> affect into the dataflow at the input operator. The tuple will be > >> injected > >>>>> at window boundary after end window and before begin window and as > this > >>>>> tuple flows through the DAG the intended operators properties will be > >>>>> modifed. They will all be modified consistently at the same window. > The > >>>>> tuple can contain more than one property changes for more than one > >> logical > >>>>> operators and the change will be applied consistently to the > different > >>>>> logical operators at the same window. In case of failure the replay > of > >>>>> tuples will ensure that the property change gets reapplied at the > >> correct > >>>>> window. > >>>>> > >>>>> Please give your feedback and input on what you think about this > >> proposal. > >>>>> > >>>>> Thanks > >>>>> > >>>> > >>>> > >> > >> > >
