This works well. How about not waiting for go ahead from Master, just have
the input operators be given a go ahead. This way the success rate goes up
a lot. On con side the control tuple traverse the entire DAG.

Thks,
Amol


On Wed, Oct 7, 2015 at 2:48 PM, Timothy Farkas <[email protected]> wrote:

> I think we could achieve a 100% gaurantee without (unnecessarily) pausing
> operators. This could be achieved by making a small addition to the above
> approach.
>
> 1.) pick a window N windows ahead of the current max window of the
> operators. Let's call this window W
> 2.) Send a property change request to the operators to change the property
> on window W
> 3.) As part of the property change request the operator will do one of two
> things:
>      a Reply with a failure if it has passed window W.
>      b Reply with a success if it has not already passed window W.
> 4.) Operators which replied with a success will asynchronously wait for a
> confirmation message to apply the property. If the operator reaches window
> W before it receives the confirmation, the operator will block until a
> confirmation is received.
> 5.) Meanwhile the app master collects the responses to the property change
> requests. If all the property change requests responded with a success,
> then a confirmation message is sent to all the operators to apply the
> property. If one or more of the operators replied with failure, then a
> property change cancellation is sent to all the operators, and then the
> whole process is retried.
>
> This way 99.99% of the time a property change would be applied without
> pausing operators. Operators will only be paused on rare ocassions, and
> only for the sake of preventing application errors that could be triggered
> by an incorrect application of a property.
>
> Thanks,
> Tim
>
>
>
> On Sun, Oct 4, 2015 at 9:40 PM, Amol Kekre <[email protected]> wrote:
>
> > Pause is hard to pull off. It has a lot of other side effect/consequences
> > on scale and on external systems that now have to back up. As number of
> > operators grows the algorithm halts more. Data-in-motion will means that
> > backlog will build up during pause, specially within external systems.
> The
> > problem occurs even if we have one logical input operator with N
> > partitions.
> >
> > A much quicker way, though not with a technical guarantee will be to let
> > users decide a window id increment in the future. The command may then be
> > "let me set properties on these operators N window in the future off the
> > current max window id amoung them". A user can then use a high enough N
> to
> > get 99.99% certainty that the window is aligned.
> >
> > Thks,
> > Amol
> >
> >
> > On Sat, Oct 3, 2015 at 9:55 AM, Timothy Farkas <[email protected]>
> > wrote:
> >
> > > The case where there is no common ancestor also has to be handled. For
> > > example you may need to change a property on two different input
> > operators.
> > > In this case the property needs to be set on both operators before the
> > same
> > > window. This also needs to be done the first time a window is computed
> by
> > > an input operator, otherwise there would be issues with idempotence.
> This
> > > could be achieved by doing the following
> > >
> > > 1. input operators would have to be paused when setting a property.
> > > 2. They would report their window id.
> > > 3. Then the max window Id needs to be picked
> > > 4. Then the property needs to be scheduled to be set at the appropriate
> > > window.
> > > 5. Then the input operators are resumed.
> > >
> > > Thanks,
> > > Tim
> > > On Oct 1, 2015 5:39 PM, "Amol Kekre" <[email protected]> wrote:
> > >
> > > >
> > > > The issue comes up when property has to be changed in multiple
> > operators,
> > > > logical or physical. Since it does not matter if this is triggered by
> > an
> > > > input adapter or any parent of this operators, stram can pick common
> > > > ancestor. Property change commands (operator id, prop name, prop val)
> > can
> > > > be inserted by the stramchild of the common ancestor.
> > > >
> > > > Thks
> > > > Amol
> > > >
> > > > Sent from my iPhone
> > > >
> > > > > On Oct 1, 2015, at 2:13 PM, Gaurav Gupta <[email protected]>
> > > wrote:
> > > > >
> > > > > Pramod,
> > > > >
> > > > > The new special property change tuple will be send to all the
> > Operators
> > > > and all the operators will have to check if the property change is
> > > > applicable for it. Although such requests may be very few, but is
> > there a
> > > > way to optimize it?
> > > > >
> > > > > Thanks
> > > > > - Gaurav
> > > > >
> > > > >> On Sep 28, 2015, at 3:44 PM, Pramod Immaneni <
> > [email protected]>
> > > > wrote:
> > > > >>
> > > > >> At the platform level that cannot be guaranteed as your operator
> > > > controls
> > > > >> and manages reading of the data. However it is not difficult to
> > > envision
> > > > >> writing an operator that would pick up a new dataset when property
> > is
> > > > >> changed.
> > > > >>
> > > > >> On Mon, Sep 28, 2015 at 3:33 PM, Ashwin Chandra Putta <
> > > > >> [email protected]> wrote:
> > > > >>
> > > > >>> Great, looking forward to these changes. Does it also provide a
> > > > guarantee
> > > > >>> on which properties are used for which input data sets?
> > > > >>>
> > > > >>> Few use case examples:
> > > > >>> - set property between reads of different batches of files. Say,
> > > > applying
> > > > >>> batch name property before processing the next batch of files.
> > > > >>> - load new configuration file for csv parser before processing
> next
> > > > set of
> > > > >>> data.
> > > > >>> - apply new regex before parsing next stream of tuples.
> > > > >>> etc.
> > > > >>>
> > > > >>> One approach to allow this is to emit subsequent tuples only
> > starting
> > > > next
> > > > >>> window after the window in which property change is made. That
> way,
> > > the
> > > > >>> boundaries between data sets is fixed and property change is done
> > in
> > > > >>> between. The user will now have a guarantee on which property
> value
> > > is
> > > > used
> > > > >>> on any given tuple.
> > > > >>>
> > > > >>> Thoughts?
> > > > >>>
> > > > >>> Regards,
> > > > >>> Ashwin.
> > > > >>>
> > > > >>> On Mon, Sep 28, 2015 at 10:17 AM, Pramod Immaneni <
> > > > [email protected]>
> > > > >>> wrote:
> > > > >>>
> > > > >>>> Apex support modification of operator properties at runtime but
> > the
> > > > >>> current
> > > > >>>> implemenations has the following shortcomings.
> > > > >>>>
> > > > >>>> 1. Property is not set across all partitions on the same window
> as
> > > > >>>> individual partitions can be on different windows when property
> > > > change is
> > > > >>>> initiated from client resulting in inconsistency of data for
> those
> > > > >>> windows.
> > > > >>>> I am being generous using the word inconsistent.
> > > > >>>> 2. Sometimes properties need to be set on more than one logical
> > > > operators
> > > > >>>> at the same time to achieve the change the user is seeking.
> Today
> > > they
> > > > >>> will
> > > > >>>> be two separate changes happening on two different windows again
> > > > >>> resulting
> > > > >>>> in inconsistent data for some windows. These would need to
> happen
> > > as a
> > > > >>>> single transaction.
> > > > >>>> 3. If there is an operator failure before a committed checkpoint
> > > > after an
> > > > >>>> operator property is dynamically changed the operator will
> restart
> > > > with
> > > > >>> the
> > > > >>>> old property and the change will not be re-applied.
> > > > >>>>
> > > > >>>> Tim and myself did some brainstorming and we have a proposal to
> > > > overcome
> > > > >>>> these shortcomings. The main problem in all the above cases is
> > that
> > > > the
> > > > >>>> property changes are happening out-of-band of data flow and
> hence
> > > > >>>> independent of windowing. The proposal is to bring the property
> > > change
> > > > >>>> request into the in-band dataflow so that they are handled
> > > > consistently
> > > > >>>> with windowing and handled distributively.
> > > > >>>>
> > > > >>>> The idea is to inject a special property change tuple containing
> > the
> > > > >>>> property changes and the identification information of the
> > > operator's
> > > > >>> they
> > > > >>>> affect into the dataflow at the input operator. The tuple will
> be
> > > > >>> injected
> > > > >>>> at window boundary after end window and before begin window and
> as
> > > > this
> > > > >>>> tuple flows through the DAG the intended operators properties
> will
> > > be
> > > > >>>> modifed. They will all be modified consistently at the same
> > window.
> > > > The
> > > > >>>> tuple can contain more than one property changes for more than
> one
> > > > >>> logical
> > > > >>>> operators and the change will be applied consistently to the
> > > different
> > > > >>>> logical operators at the same window. In case of failure the
> > replay
> > > of
> > > > >>>> tuples will ensure that the property change gets reapplied at
> the
> > > > correct
> > > > >>>> window.
> > > > >>>>
> > > > >>>> Please give your feedback and input on what you think about this
> > > > >>> proposal.
> > > > >>>>
> > > > >>>> Thanks
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>> --
> > > > >>>
> > > > >>> Regards,
> > > > >>> Ashwin.
> > > > >
> > > >
> > >
> >
>

Reply via email to