Pause is hard to pull off. It has a lot of other side effect/consequences on scale and on external systems that now have to back up. As number of operators grows the algorithm halts more. Data-in-motion will means that backlog will build up during pause, specially within external systems. The problem occurs even if we have one logical input operator with N partitions.
A much quicker way, though not with a technical guarantee will be to let users decide a window id increment in the future. The command may then be "let me set properties on these operators N window in the future off the current max window id amoung them". A user can then use a high enough N to get 99.99% certainty that the window is aligned. Thks, Amol On Sat, Oct 3, 2015 at 9:55 AM, Timothy Farkas <[email protected]> wrote: > The case where there is no common ancestor also has to be handled. For > example you may need to change a property on two different input operators. > In this case the property needs to be set on both operators before the same > window. This also needs to be done the first time a window is computed by > an input operator, otherwise there would be issues with idempotence. This > could be achieved by doing the following > > 1. input operators would have to be paused when setting a property. > 2. They would report their window id. > 3. Then the max window Id needs to be picked > 4. Then the property needs to be scheduled to be set at the appropriate > window. > 5. Then the input operators are resumed. > > Thanks, > Tim > On Oct 1, 2015 5:39 PM, "Amol Kekre" <[email protected]> wrote: > > > > > The issue comes up when property has to be changed in multiple operators, > > logical or physical. Since it does not matter if this is triggered by an > > input adapter or any parent of this operators, stram can pick common > > ancestor. Property change commands (operator id, prop name, prop val) can > > be inserted by the stramchild of the common ancestor. > > > > Thks > > Amol > > > > Sent from my iPhone > > > > > On Oct 1, 2015, at 2:13 PM, Gaurav Gupta <[email protected]> > wrote: > > > > > > Pramod, > > > > > > The new special property change tuple will be send to all the Operators > > and all the operators will have to check if the property change is > > applicable for it. Although such requests may be very few, but is there a > > way to optimize it? > > > > > > Thanks > > > - Gaurav > > > > > >> On Sep 28, 2015, at 3:44 PM, Pramod Immaneni <[email protected]> > > wrote: > > >> > > >> At the platform level that cannot be guaranteed as your operator > > controls > > >> and manages reading of the data. However it is not difficult to > envision > > >> writing an operator that would pick up a new dataset when property is > > >> changed. > > >> > > >> On Mon, Sep 28, 2015 at 3:33 PM, Ashwin Chandra Putta < > > >> [email protected]> wrote: > > >> > > >>> Great, looking forward to these changes. Does it also provide a > > guarantee > > >>> on which properties are used for which input data sets? > > >>> > > >>> Few use case examples: > > >>> - set property between reads of different batches of files. Say, > > applying > > >>> batch name property before processing the next batch of files. > > >>> - load new configuration file for csv parser before processing next > > set of > > >>> data. > > >>> - apply new regex before parsing next stream of tuples. > > >>> etc. > > >>> > > >>> One approach to allow this is to emit subsequent tuples only starting > > next > > >>> window after the window in which property change is made. That way, > the > > >>> boundaries between data sets is fixed and property change is done in > > >>> between. The user will now have a guarantee on which property value > is > > used > > >>> on any given tuple. > > >>> > > >>> Thoughts? > > >>> > > >>> Regards, > > >>> Ashwin. > > >>> > > >>> On Mon, Sep 28, 2015 at 10:17 AM, Pramod Immaneni < > > [email protected]> > > >>> wrote: > > >>> > > >>>> Apex support modification of operator properties at runtime but the > > >>> current > > >>>> implemenations has the following shortcomings. > > >>>> > > >>>> 1. Property is not set across all partitions on the same window as > > >>>> individual partitions can be on different windows when property > > change is > > >>>> initiated from client resulting in inconsistency of data for those > > >>> windows. > > >>>> I am being generous using the word inconsistent. > > >>>> 2. Sometimes properties need to be set on more than one logical > > operators > > >>>> at the same time to achieve the change the user is seeking. Today > they > > >>> will > > >>>> be two separate changes happening on two different windows again > > >>> resulting > > >>>> in inconsistent data for some windows. These would need to happen > as a > > >>>> single transaction. > > >>>> 3. If there is an operator failure before a committed checkpoint > > after an > > >>>> operator property is dynamically changed the operator will restart > > with > > >>> the > > >>>> old property and the change will not be re-applied. > > >>>> > > >>>> Tim and myself did some brainstorming and we have a proposal to > > overcome > > >>>> these shortcomings. The main problem in all the above cases is that > > the > > >>>> property changes are happening out-of-band of data flow and hence > > >>>> independent of windowing. The proposal is to bring the property > change > > >>>> request into the in-band dataflow so that they are handled > > consistently > > >>>> with windowing and handled distributively. > > >>>> > > >>>> The idea is to inject a special property change tuple containing the > > >>>> property changes and the identification information of the > operator's > > >>> they > > >>>> affect into the dataflow at the input operator. The tuple will be > > >>> injected > > >>>> at window boundary after end window and before begin window and as > > this > > >>>> tuple flows through the DAG the intended operators properties will > be > > >>>> modifed. They will all be modified consistently at the same window. > > The > > >>>> tuple can contain more than one property changes for more than one > > >>> logical > > >>>> operators and the change will be applied consistently to the > different > > >>>> logical operators at the same window. In case of failure the replay > of > > >>>> tuples will ensure that the property change gets reapplied at the > > correct > > >>>> window. > > >>>> > > >>>> Please give your feedback and input on what you think about this > > >>> proposal. > > >>>> > > >>>> Thanks > > >>> > > >>> > > >>> > > >>> -- > > >>> > > >>> Regards, > > >>> Ashwin. > > > > > >
