Re: dynamic application properties proposal

Pramod Immaneni Thu, 08 Oct 2015 09:54:12 -0700

The user would not be picking the windowid as they cannot correlate with it
something, the setting of property from a user perspective will look very
much the way it is set today except for the additional ability to set
property for more than one operators. Stram will not be managing window
state. The property tuple will be injected at the next possible window at
the input operator except for the case where there are multiple input
operators then the protocol that time mentioned in the earlier email will
be used.


On Thu, Oct 8, 2015 at 9:36 AM, Gaurav Gupta <[email protected]> wrote:

> If user is manually setting the property then user can pick the windowId to
> apply the property change but how would dynamic property change using
> OperatorRequest as part of StatsListener work?
>
> Does this not mean that the Stram will have to start managing window state
> of
> operators?
>
>
> Thanks
> -Gaurav
>
> On Wed, Oct 7, 2015 at 2:48 PM, Timothy Farkas <[email protected]>
> wrote:
>
> > I think we could achieve a 100% gaurantee without (unnecessarily) pausing
> > operators. This could be achieved by making a small addition to the above
> > approach.
> >
> > 1.) pick a window N windows ahead of the current max window of the
> > operators. Let's call this window W
> > 2.) Send a property change request to the operators to change the
> property
> > on window W
> > 3.) As part of the property change request the operator will do one of
> two
> > things:
> >      a Reply with a failure if it has passed window W.
> >      b Reply with a success if it has not already passed window W.
> > 4.) Operators which replied with a success will asynchronously wait for a
> > confirmation message to apply the property. If the operator reaches
> window
> > W before it receives the confirmation, the operator will block until a
> > confirmation is received.
> > 5.) Meanwhile the app master collects the responses to the property
> change
> > requests. If all the property change requests responded with a success,
> > then a confirmation message is sent to all the operators to apply the
> > property. If one or more of the operators replied with failure, then a
> > property change cancellation is sent to all the operators, and then the
> > whole process is retried.
> >
> > This way 99.99% of the time a property change would be applied without
> > pausing operators. Operators will only be paused on rare ocassions, and
> > only for the sake of preventing application errors that could be
> triggered
> > by an incorrect application of a property.
> >
> > Thanks,
> > Tim
> >
> >
> >
> > On Sun, Oct 4, 2015 at 9:40 PM, Amol Kekre <[email protected]> wrote:
> >
> > > Pause is hard to pull off. It has a lot of other side
> effect/consequences
> > > on scale and on external systems that now have to back up. As number of
> > > operators grows the algorithm halts more. Data-in-motion will means
> that
> > > backlog will build up during pause, specially within external systems.
> > The
> > > problem occurs even if we have one logical input operator with N
> > > partitions.
> > >
> > > A much quicker way, though not with a technical guarantee will be to
> let
> > > users decide a window id increment in the future. The command may then
> be
> > > "let me set properties on these operators N window in the future off
> the
> > > current max window id amoung them". A user can then use a high enough N
> > to
> > > get 99.99% certainty that the window is aligned.
> > >
> > > Thks,
> > > Amol
> > >
> > >
> > > On Sat, Oct 3, 2015 at 9:55 AM, Timothy Farkas <[email protected]>
> > > wrote:
> > >
> > > > The case where there is no common ancestor also has to be handled.
> For
> > > > example you may need to change a property on two different input
> > > operators.
> > > > In this case the property needs to be set on both operators before
> the
> > > same
> > > > window. This also needs to be done the first time a window is
> computed
> > by
> > > > an input operator, otherwise there would be issues with idempotence.
> > This
> > > > could be achieved by doing the following
> > > >
> > > > 1. input operators would have to be paused when setting a property.
> > > > 2. They would report their window id.
> > > > 3. Then the max window Id needs to be picked
> > > > 4. Then the property needs to be scheduled to be set at the
> appropriate
> > > > window.
> > > > 5. Then the input operators are resumed.
> > > >
> > > > Thanks,
> > > > Tim
> > > > On Oct 1, 2015 5:39 PM, "Amol Kekre" <[email protected]> wrote:
> > > >
> > > > >
> > > > > The issue comes up when property has to be changed in multiple
> > > operators,
> > > > > logical or physical. Since it does not matter if this is triggered
> by
> > > an
> > > > > input adapter or any parent of this operators, stram can pick
> common
> > > > > ancestor. Property change commands (operator id, prop name, prop
> val)
> > > can
> > > > > be inserted by the stramchild of the common ancestor.
> > > > >
> > > > > Thks
> > > > > Amol
> > > > >
> > > > > Sent from my iPhone
> > > > >
> > > > > > On Oct 1, 2015, at 2:13 PM, Gaurav Gupta <[email protected]
> >
> > > > wrote:
> > > > > >
> > > > > > Pramod,
> > > > > >
> > > > > > The new special property change tuple will be send to all the
> > > Operators
> > > > > and all the operators will have to check if the property change is
> > > > > applicable for it. Although such requests may be very few, but is
> > > there a
> > > > > way to optimize it?
> > > > > >
> > > > > > Thanks
> > > > > > - Gaurav
> > > > > >
> > > > > >> On Sep 28, 2015, at 3:44 PM, Pramod Immaneni <
> > > [email protected]>
> > > > > wrote:
> > > > > >>
> > > > > >> At the platform level that cannot be guaranteed as your operator
> > > > > controls
> > > > > >> and manages reading of the data. However it is not difficult to
> > > > envision
> > > > > >> writing an operator that would pick up a new dataset when
> property
> > > is
> > > > > >> changed.
> > > > > >>
> > > > > >> On Mon, Sep 28, 2015 at 3:33 PM, Ashwin Chandra Putta <
> > > > > >> [email protected]> wrote:
> > > > > >>
> > > > > >>> Great, looking forward to these changes. Does it also provide a
> > > > > guarantee
> > > > > >>> on which properties are used for which input data sets?
> > > > > >>>
> > > > > >>> Few use case examples:
> > > > > >>> - set property between reads of different batches of files.
> Say,
> > > > > applying
> > > > > >>> batch name property before processing the next batch of files.
> > > > > >>> - load new configuration file for csv parser before processing
> > next
> > > > > set of
> > > > > >>> data.
> > > > > >>> - apply new regex before parsing next stream of tuples.
> > > > > >>> etc.
> > > > > >>>
> > > > > >>> One approach to allow this is to emit subsequent tuples only
> > > starting
> > > > > next
> > > > > >>> window after the window in which property change is made. That
> > way,
> > > > the
> > > > > >>> boundaries between data sets is fixed and property change is
> done
> > > in
> > > > > >>> between. The user will now have a guarantee on which property
> > value
> > > > is
> > > > > used
> > > > > >>> on any given tuple.
> > > > > >>>
> > > > > >>> Thoughts?
> > > > > >>>
> > > > > >>> Regards,
> > > > > >>> Ashwin.
> > > > > >>>
> > > > > >>> On Mon, Sep 28, 2015 at 10:17 AM, Pramod Immaneni <
> > > > > [email protected]>
> > > > > >>> wrote:
> > > > > >>>
> > > > > >>>> Apex support modification of operator properties at runtime
> but
> > > the
> > > > > >>> current
> > > > > >>>> implemenations has the following shortcomings.
> > > > > >>>>
> > > > > >>>> 1. Property is not set across all partitions on the same
> window
> > as
> > > > > >>>> individual partitions can be on different windows when
> property
> > > > > change is
> > > > > >>>> initiated from client resulting in inconsistency of data for
> > those
> > > > > >>> windows.
> > > > > >>>> I am being generous using the word inconsistent.
> > > > > >>>> 2. Sometimes properties need to be set on more than one
> logical
> > > > > operators
> > > > > >>>> at the same time to achieve the change the user is seeking.
> > Today
> > > > they
> > > > > >>> will
> > > > > >>>> be two separate changes happening on two different windows
> again
> > > > > >>> resulting
> > > > > >>>> in inconsistent data for some windows. These would need to
> > happen
> > > > as a
> > > > > >>>> single transaction.
> > > > > >>>> 3. If there is an operator failure before a committed
> checkpoint
> > > > > after an
> > > > > >>>> operator property is dynamically changed the operator will
> > restart
> > > > > with
> > > > > >>> the
> > > > > >>>> old property and the change will not be re-applied.
> > > > > >>>>
> > > > > >>>> Tim and myself did some brainstorming and we have a proposal
> to
> > > > > overcome
> > > > > >>>> these shortcomings. The main problem in all the above cases is
> > > that
> > > > > the
> > > > > >>>> property changes are happening out-of-band of data flow and
> > hence
> > > > > >>>> independent of windowing. The proposal is to bring the
> property
> > > > change
> > > > > >>>> request into the in-band dataflow so that they are handled
> > > > > consistently
> > > > > >>>> with windowing and handled distributively.
> > > > > >>>>
> > > > > >>>> The idea is to inject a special property change tuple
> containing
> > > the
> > > > > >>>> property changes and the identification information of the
> > > > operator's
> > > > > >>> they
> > > > > >>>> affect into the dataflow at the input operator. The tuple will
> > be
> > > > > >>> injected
> > > > > >>>> at window boundary after end window and before begin window
> and
> > as
> > > > > this
> > > > > >>>> tuple flows through the DAG the intended operators properties
> > will
> > > > be
> > > > > >>>> modifed. They will all be modified consistently at the same
> > > window.
> > > > > The
> > > > > >>>> tuple can contain more than one property changes for more than
> > one
> > > > > >>> logical
> > > > > >>>> operators and the change will be applied consistently to the
> > > > different
> > > > > >>>> logical operators at the same window. In case of failure the
> > > replay
> > > > of
> > > > > >>>> tuples will ensure that the property change gets reapplied at
> > the
> > > > > correct
> > > > > >>>> window.
> > > > > >>>>
> > > > > >>>> Please give your feedback and input on what you think about
> this
> > > > > >>> proposal.
> > > > > >>>>
> > > > > >>>> Thanks
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>> --
> > > > > >>>
> > > > > >>> Regards,
> > > > > >>> Ashwin.
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: dynamic application properties proposal

Reply via email to