I agree with Vlad too. Thks Amol
On Tue, Feb 2, 2016 at 3:33 PM, Munagala Ramanath <[email protected]> wrote: > I agree with Vlad: these names are so deeply embedded in the community that > changing them is likely > to create more problems than it solves. > > Ram > > On Tue, Feb 2, 2016 at 3:29 PM, Vlad Rozov <[email protected]> > wrote: > > > I vote to keep original names and educate/explain their meaning to non > > technical audience as delivery guarantee is not specific to Apex, but has > > common meaning for all streaming platforms. > > > > Vlad > > > > > > On 2/2/16 15:17, Timothy Farkas wrote: > > > >> Could we provide Processing and Output Centric Aliases for the > >> ProcessingModes? > >> > >> ProcessingMode.AT_MOST_ONCE_OUTPUT = ProcessingMode.AT_MOST_ONCE > >> ProcessingMode.EXACTLY_ONCE_OUTPUT = ProcessingMode.AT_LEAST_ONCE > >> > >> ProcessingMode.AT_MOST_ONCE_PROCESSING = ProcessingMode.AT_MOST_ONCE > >> ProcessingMode.AT_LEAST_ONCE_PROCESSING = ProcessingMode.AT_LEAST_ONCE > >> ProcessingMode.EXACTLY_ONCE_PROCESSING = ProcessingMode.EXACTLY_ONCE > >> > >> Tim > >> > >> On Tue, Feb 2, 2016 at 3:00 PM, Pramod Immaneni <[email protected] > > > >> wrote: > >> > >> Well output guarantees are managed by the operators themselves so the > user > >>> will typically not see that as part of the engine features, they only > see > >>> processing guarantees and while they are technically correct as far as > >>> individual operators are concerned the names give a different idea. > >>> > >>> Thanks > >>> > >>> On Tue, Feb 2, 2016 at 2:53 PM, Timothy Farkas <[email protected]> > >>> wrote: > >>> > >>> I think I understand the ambiguity you are trying to clear up Pramod. > >>>> Perhaps it can be disambiguated by distinguishing between Processing > >>>> Guarantees and Output Guarantees, when explaining to people. > Processing > >>>> Guarantees apply to the way tuples are transmitted between operators. > >>>> Output Guarantees apply to the way output operators write tuples to a > >>>> > >>> Data > >>> > >>>> Sink. > >>>> > >>>> This way we can describe each term intuitively in each context: > >>>> > >>>> At Most Once: A tuple can be dropped or transmitted (written) only > once. > >>>> At Least Once: A tuple can be transmitted (written) one or more times. > >>>> Exactly Once: A tuple is transmitted (written) only once. > >>>> > >>>> Then we could provide a table with the strongest Output Guarantee that > >>>> is > >>>> possible for each Processing Guarantee. > >>>> > >>>> Processing | Strongest Output Guarantee > >>>> ---------------------------------------------- > >>>> At Most Once | At Most Once > >>>> At Least Once | Exactly Once > >>>> Exactly Once | Exactly Once > >>>> > >>>> Thoughts? > >>>> > >>>> Thanks, > >>>> Tim > >>>> > >>>> On Tue, Feb 2, 2016 at 2:25 PM, Sandesh Hegde < > [email protected]> > >>>> wrote: > >>>> > >>>> I agree with Tim. Instead of new terminologies, better explanation for > >>>>> > >>>> the > >>>> > >>>>> existing once are more useful. > >>>>> > >>>>> On Tue, Feb 2, 2016 at 2:23 PM Pramod Immaneni < > [email protected] > >>>>> wrote: > >>>>> > >>>>> The idea is to disambiguate without using at least once since exactly > >>>>>> > >>>>> once > >>>>> > >>>>>> output can still be achieved with those. Any other names are fine, > >>>>>> > >>>>> those > >>>> > >>>>> were just suggestions. > >>>>>> > >>>>>> On Tue, Feb 2, 2016 at 2:10 PM, Timothy Farkas <[email protected] > > > >>>>>> wrote: > >>>>>> > >>>>>> The new names don't make as much sense to me as the original names. > >>>>>>> > >>>>>> The > >>>> > >>>>> concepts require some thought to understand, and it won't > >>>>>>> > >>>>>> necessarily > >>> > >>>> be > >>>>> > >>>>>> made easier with a name change. I think a better way to attack > >>>>>>> misunderstandings is to clearly explain what a window, operator, > >>>>>>> > >>>>>> input > >>>> > >>>>> operator, output operator, tuple, checkpoint, and DAG is with > >>>>>>> > >>>>>> really > >>> > >>>> clean > >>>>>> > >>>>>>> and simple illustrations of the concepts. Then we can explain more > >>>>>>> > >>>>>> involved > >>>>>> > >>>>>>> concepts like At Least Once, At Most Once, and Exactly Once with > >>>>>>> > >>>>>> well > >>> > >>>> thought illustrations. Without a clear explanation of the basic > >>>>>>> > >>>>>> vocabulary, > >>>>>> > >>>>>>> and without pictures, it is difficult to get even technical people > >>>>>>> > >>>>>> to > >>> > >>>> understand these concepts. > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Tim > >>>>>>> > >>>>>>> On Tue, Feb 2, 2016 at 9:13 AM, Pramod Immaneni < > >>>>>>> > >>>>>> [email protected]> > >>>>> > >>>>>> wrote: > >>>>>>> > >>>>>>> Today we support three different processing modes for operators, > >>>>>>>> > >>>>>>> "at > >>>> > >>>>> least > >>>>>>> > >>>>>>>> once", "at most once" and "exactly once" which determine tuple > >>>>>>>> > >>>>>>> processing > >>>>>> > >>>>>>> and recovery behavior when there is operator recovery from > >>>>>>>> > >>>>>>> failure. > >>> > >>>> The > >>>>> > >>>>>> default being at least once where the tuples are replayed from > >>>>>>>> > >>>>>>> the > >>> > >>>> recovered checkpoint. > >>>>>>>> > >>>>>>>> At least once works well for most applications. Typically > >>>>>>>> > >>>>>>> applications > >>>>> > >>>>>> persist the final output of processing through the DAG into > >>>>>>>> > >>>>>>> various > >>> > >>>> outputs > >>>>>>> > >>>>>>>> like key value stores, databases or even HDFS files. In many of > >>>>>>>> > >>>>>>> these > >>>> > >>>>> cases > >>>>>>> > >>>>>>>> various strategies can be employed to save the data "exactly > >>>>>>>> > >>>>>>> once" > >>> > >>>> in > >>>> > >>>>> the > >>>>>> > >>>>>>> output, such as transactions, rewinding, meta data storage, > >>>>>>>> > >>>>>>> idempotent > >>>>> > >>>>>> operations etc. Furthermore the exactly once processing mode, > >>>>>>>> > >>>>>>> which > >>> > >>>> is > >>>>> > >>>>>> a > >>>>>> > >>>>>>> checkpoint performed every window is rarely used. All this leads > >>>>>>>> > >>>>>>> to > >>> > >>>> confusion especially to somebody new and also makes it difficult > >>>>>>>> > >>>>>>> to > >>> > >>>> explain > >>>>>>> > >>>>>>>> these names to less technical audience in meetups and public > >>>>>>>> > >>>>>>> forums. > >>>> > >>>>> What I am proposing is only a name change which will make this > >>>>>>>> > >>>>>>> more > >>> > >>>> intuitive to understand. Something simple like "repeat" for "at > >>>>>>>> > >>>>>>> least > >>>> > >>>>> once", "latest" for "at most once" and "repeat latest" for > >>>>>>>> > >>>>>>> "exactly > >>> > >>>> once" > >>>>>> > >>>>>>> can do the trick. > >>>>>>>> > >>>>>>>> Thanks > >>>>>>>> > >>>>>>>> > > >
