Re: dynamic application properties proposal

Amol Kekre Sun, 04 Oct 2015 22:40:30 -0700

Pause is hard to pull off. It has a lot of other side effect/consequences
on scale and on external systems that now have to back up. As number of
operators grows the algorithm halts more. Data-in-motion will means that
backlog will build up during pause, specially within external systems. The
problem occurs even if we have one logical input operator with N partitions.


A much quicker way, though not with a technical guarantee will be to let
users decide a window id increment in the future. The command may then be
"let me set properties on these operators N window in the future off the
current max window id amoung them". A user can then use a high enough N to
get 99.99% certainty that the window is aligned.

Thks,
Amol


On Sat, Oct 3, 2015 at 9:55 AM, Timothy Farkas <[email protected]> wrote:

> The case where there is no common ancestor also has to be handled. For
> example you may need to change a property on two different input operators.
> In this case the property needs to be set on both operators before the same
> window. This also needs to be done the first time a window is computed by
> an input operator, otherwise there would be issues with idempotence. This
> could be achieved by doing the following
>
> 1. input operators would have to be paused when setting a property.
> 2. They would report their window id.
> 3. Then the max window Id needs to be picked
> 4. Then the property needs to be scheduled to be set at the appropriate
> window.
> 5. Then the input operators are resumed.
>
> Thanks,
> Tim
> On Oct 1, 2015 5:39 PM, "Amol Kekre" <[email protected]> wrote:
>
> >
> > The issue comes up when property has to be changed in multiple operators,
> > logical or physical. Since it does not matter if this is triggered by an
> > input adapter or any parent of this operators, stram can pick common
> > ancestor. Property change commands (operator id, prop name, prop val) can
> > be inserted by the stramchild of the common ancestor.
> >
> > Thks
> > Amol
> >
> > Sent from my iPhone
> >
> > > On Oct 1, 2015, at 2:13 PM, Gaurav Gupta <[email protected]>
> wrote:
> > >
> > > Pramod,
> > >
> > > The new special property change tuple will be send to all the Operators
> > and all the operators will have to check if the property change is
> > applicable for it. Although such requests may be very few, but is there a
> > way to optimize it?
> > >
> > > Thanks
> > > - Gaurav
> > >
> > >> On Sep 28, 2015, at 3:44 PM, Pramod Immaneni <[email protected]>
> > wrote:
> > >>
> > >> At the platform level that cannot be guaranteed as your operator
> > controls
> > >> and manages reading of the data. However it is not difficult to
> envision
> > >> writing an operator that would pick up a new dataset when property is
> > >> changed.
> > >>
> > >> On Mon, Sep 28, 2015 at 3:33 PM, Ashwin Chandra Putta <
> > >> [email protected]> wrote:
> > >>
> > >>> Great, looking forward to these changes. Does it also provide a
> > guarantee
> > >>> on which properties are used for which input data sets?
> > >>>
> > >>> Few use case examples:
> > >>> - set property between reads of different batches of files. Say,
> > applying
> > >>> batch name property before processing the next batch of files.
> > >>> - load new configuration file for csv parser before processing next
> > set of
> > >>> data.
> > >>> - apply new regex before parsing next stream of tuples.
> > >>> etc.
> > >>>
> > >>> One approach to allow this is to emit subsequent tuples only starting
> > next
> > >>> window after the window in which property change is made. That way,
> the
> > >>> boundaries between data sets is fixed and property change is done in
> > >>> between. The user will now have a guarantee on which property value
> is
> > used
> > >>> on any given tuple.
> > >>>
> > >>> Thoughts?
> > >>>
> > >>> Regards,
> > >>> Ashwin.
> > >>>
> > >>> On Mon, Sep 28, 2015 at 10:17 AM, Pramod Immaneni <
> > [email protected]>
> > >>> wrote:
> > >>>
> > >>>> Apex support modification of operator properties at runtime but the
> > >>> current
> > >>>> implemenations has the following shortcomings.
> > >>>>
> > >>>> 1. Property is not set across all partitions on the same window as
> > >>>> individual partitions can be on different windows when property
> > change is
> > >>>> initiated from client resulting in inconsistency of data for those
> > >>> windows.
> > >>>> I am being generous using the word inconsistent.
> > >>>> 2. Sometimes properties need to be set on more than one logical
> > operators
> > >>>> at the same time to achieve the change the user is seeking. Today
> they
> > >>> will
> > >>>> be two separate changes happening on two different windows again
> > >>> resulting
> > >>>> in inconsistent data for some windows. These would need to happen
> as a
> > >>>> single transaction.
> > >>>> 3. If there is an operator failure before a committed checkpoint
> > after an
> > >>>> operator property is dynamically changed the operator will restart
> > with
> > >>> the
> > >>>> old property and the change will not be re-applied.
> > >>>>
> > >>>> Tim and myself did some brainstorming and we have a proposal to
> > overcome
> > >>>> these shortcomings. The main problem in all the above cases is that
> > the
> > >>>> property changes are happening out-of-band of data flow and hence
> > >>>> independent of windowing. The proposal is to bring the property
> change
> > >>>> request into the in-band dataflow so that they are handled
> > consistently
> > >>>> with windowing and handled distributively.
> > >>>>
> > >>>> The idea is to inject a special property change tuple containing the
> > >>>> property changes and the identification information of the
> operator's
> > >>> they
> > >>>> affect into the dataflow at the input operator. The tuple will be
> > >>> injected
> > >>>> at window boundary after end window and before begin window and as
> > this
> > >>>> tuple flows through the DAG the intended operators properties will
> be
> > >>>> modifed. They will all be modified consistently at the same window.
> > The
> > >>>> tuple can contain more than one property changes for more than one
> > >>> logical
> > >>>> operators and the change will be applied consistently to the
> different
> > >>>> logical operators at the same window. In case of failure the replay
> of
> > >>>> tuples will ensure that the property change gets reapplied at the
> > correct
> > >>>> window.
> > >>>>
> > >>>> Please give your feedback and input on what you think about this
> > >>> proposal.
> > >>>>
> > >>>> Thanks
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>>
> > >>> Regards,
> > >>> Ashwin.
> > >
> >
>

Re: dynamic application properties proposal

Reply via email to