+1 for adding detailed explanation about the concepts in tutorials.
On Wed, Feb 3, 2016 at 11:30 AM, Chinmay Kolhatkar <[email protected]> wrote: > +1 for Vlad's suggestion. Searching for keywords like "at least once", "at > most once" and "exactly once" tells that these terminologies are are widely > popular where semantics are defined for tuple processing. > Adding example applications for each of them would help in educating the > terminologies in Apex context. > > On Wed, Feb 3, 2016 at 8:52 AM, Chanchal Singh <[email protected] > > > wrote: > > > I do agree with Vlad. it will be good to have good explanation with > example > > for existing names as it will be not create confusion for those who > already > > knows it and also for those who are beginners. > > > > On Wed, Feb 3, 2016 at 8:38 AM, Amol Kekre <[email protected]> wrote: > > > > > I agree with Vlad too. > > > > > > Thks > > > Amol > > > > > > > > > On Tue, Feb 2, 2016 at 3:33 PM, Munagala Ramanath <[email protected] > > > > > wrote: > > > > > > > I agree with Vlad: these names are so deeply embedded in the > community > > > that > > > > changing them is likely > > > > to create more problems than it solves. > > > > > > > > Ram > > > > > > > > On Tue, Feb 2, 2016 at 3:29 PM, Vlad Rozov <[email protected]> > > > > wrote: > > > > > > > > > I vote to keep original names and educate/explain their meaning to > > non > > > > > technical audience as delivery guarantee is not specific to Apex, > but > > > has > > > > > common meaning for all streaming platforms. > > > > > > > > > > Vlad > > > > > > > > > > > > > > > On 2/2/16 15:17, Timothy Farkas wrote: > > > > > > > > > >> Could we provide Processing and Output Centric Aliases for the > > > > >> ProcessingModes? > > > > >> > > > > >> ProcessingMode.AT_MOST_ONCE_OUTPUT = ProcessingMode.AT_MOST_ONCE > > > > >> ProcessingMode.EXACTLY_ONCE_OUTPUT = ProcessingMode.AT_LEAST_ONCE > > > > >> > > > > >> ProcessingMode.AT_MOST_ONCE_PROCESSING = > ProcessingMode.AT_MOST_ONCE > > > > >> ProcessingMode.AT_LEAST_ONCE_PROCESSING = > > ProcessingMode.AT_LEAST_ONCE > > > > >> ProcessingMode.EXACTLY_ONCE_PROCESSING = > ProcessingMode.EXACTLY_ONCE > > > > >> > > > > >> Tim > > > > >> > > > > >> On Tue, Feb 2, 2016 at 3:00 PM, Pramod Immaneni < > > > [email protected] > > > > > > > > > >> wrote: > > > > >> > > > > >> Well output guarantees are managed by the operators themselves so > > the > > > > user > > > > >>> will typically not see that as part of the engine features, they > > only > > > > see > > > > >>> processing guarantees and while they are technically correct as > far > > > as > > > > >>> individual operators are concerned the names give a different > idea. > > > > >>> > > > > >>> Thanks > > > > >>> > > > > >>> On Tue, Feb 2, 2016 at 2:53 PM, Timothy Farkas < > > [email protected]> > > > > >>> wrote: > > > > >>> > > > > >>> I think I understand the ambiguity you are trying to clear up > > Pramod. > > > > >>>> Perhaps it can be disambiguated by distinguishing between > > Processing > > > > >>>> Guarantees and Output Guarantees, when explaining to people. > > > > Processing > > > > >>>> Guarantees apply to the way tuples are transmitted between > > > operators. > > > > >>>> Output Guarantees apply to the way output operators write tuples > > to > > > a > > > > >>>> > > > > >>> Data > > > > >>> > > > > >>>> Sink. > > > > >>>> > > > > >>>> This way we can describe each term intuitively in each context: > > > > >>>> > > > > >>>> At Most Once: A tuple can be dropped or transmitted (written) > only > > > > once. > > > > >>>> At Least Once: A tuple can be transmitted (written) one or more > > > times. > > > > >>>> Exactly Once: A tuple is transmitted (written) only once. > > > > >>>> > > > > >>>> Then we could provide a table with the strongest Output > Guarantee > > > that > > > > >>>> is > > > > >>>> possible for each Processing Guarantee. > > > > >>>> > > > > >>>> Processing | Strongest Output Guarantee > > > > >>>> ---------------------------------------------- > > > > >>>> At Most Once | At Most Once > > > > >>>> At Least Once | Exactly Once > > > > >>>> Exactly Once | Exactly Once > > > > >>>> > > > > >>>> Thoughts? > > > > >>>> > > > > >>>> Thanks, > > > > >>>> Tim > > > > >>>> > > > > >>>> On Tue, Feb 2, 2016 at 2:25 PM, Sandesh Hegde < > > > > [email protected]> > > > > >>>> wrote: > > > > >>>> > > > > >>>> I agree with Tim. Instead of new terminologies, better > explanation > > > for > > > > >>>>> > > > > >>>> the > > > > >>>> > > > > >>>>> existing once are more useful. > > > > >>>>> > > > > >>>>> On Tue, Feb 2, 2016 at 2:23 PM Pramod Immaneni < > > > > [email protected] > > > > >>>>> wrote: > > > > >>>>> > > > > >>>>> The idea is to disambiguate without using at least once since > > > exactly > > > > >>>>>> > > > > >>>>> once > > > > >>>>> > > > > >>>>>> output can still be achieved with those. Any other names are > > fine, > > > > >>>>>> > > > > >>>>> those > > > > >>>> > > > > >>>>> were just suggestions. > > > > >>>>>> > > > > >>>>>> On Tue, Feb 2, 2016 at 2:10 PM, Timothy Farkas < > > > [email protected] > > > > > > > > > >>>>>> wrote: > > > > >>>>>> > > > > >>>>>> The new names don't make as much sense to me as the original > > > names. > > > > >>>>>>> > > > > >>>>>> The > > > > >>>> > > > > >>>>> concepts require some thought to understand, and it won't > > > > >>>>>>> > > > > >>>>>> necessarily > > > > >>> > > > > >>>> be > > > > >>>>> > > > > >>>>>> made easier with a name change. I think a better way to attack > > > > >>>>>>> misunderstandings is to clearly explain what a window, > > operator, > > > > >>>>>>> > > > > >>>>>> input > > > > >>>> > > > > >>>>> operator, output operator, tuple, checkpoint, and DAG is with > > > > >>>>>>> > > > > >>>>>> really > > > > >>> > > > > >>>> clean > > > > >>>>>> > > > > >>>>>>> and simple illustrations of the concepts. Then we can explain > > > more > > > > >>>>>>> > > > > >>>>>> involved > > > > >>>>>> > > > > >>>>>>> concepts like At Least Once, At Most Once, and Exactly Once > > with > > > > >>>>>>> > > > > >>>>>> well > > > > >>> > > > > >>>> thought illustrations. Without a clear explanation of the basic > > > > >>>>>>> > > > > >>>>>> vocabulary, > > > > >>>>>> > > > > >>>>>>> and without pictures, it is difficult to get even technical > > > people > > > > >>>>>>> > > > > >>>>>> to > > > > >>> > > > > >>>> understand these concepts. > > > > >>>>>>> > > > > >>>>>>> Thanks, > > > > >>>>>>> Tim > > > > >>>>>>> > > > > >>>>>>> On Tue, Feb 2, 2016 at 9:13 AM, Pramod Immaneni < > > > > >>>>>>> > > > > >>>>>> [email protected]> > > > > >>>>> > > > > >>>>>> wrote: > > > > >>>>>>> > > > > >>>>>>> Today we support three different processing modes for > > operators, > > > > >>>>>>>> > > > > >>>>>>> "at > > > > >>>> > > > > >>>>> least > > > > >>>>>>> > > > > >>>>>>>> once", "at most once" and "exactly once" which determine > tuple > > > > >>>>>>>> > > > > >>>>>>> processing > > > > >>>>>> > > > > >>>>>>> and recovery behavior when there is operator recovery from > > > > >>>>>>>> > > > > >>>>>>> failure. > > > > >>> > > > > >>>> The > > > > >>>>> > > > > >>>>>> default being at least once where the tuples are replayed from > > > > >>>>>>>> > > > > >>>>>>> the > > > > >>> > > > > >>>> recovered checkpoint. > > > > >>>>>>>> > > > > >>>>>>>> At least once works well for most applications. Typically > > > > >>>>>>>> > > > > >>>>>>> applications > > > > >>>>> > > > > >>>>>> persist the final output of processing through the DAG into > > > > >>>>>>>> > > > > >>>>>>> various > > > > >>> > > > > >>>> outputs > > > > >>>>>>> > > > > >>>>>>>> like key value stores, databases or even HDFS files. In many > > of > > > > >>>>>>>> > > > > >>>>>>> these > > > > >>>> > > > > >>>>> cases > > > > >>>>>>> > > > > >>>>>>>> various strategies can be employed to save the data "exactly > > > > >>>>>>>> > > > > >>>>>>> once" > > > > >>> > > > > >>>> in > > > > >>>> > > > > >>>>> the > > > > >>>>>> > > > > >>>>>>> output, such as transactions, rewinding, meta data storage, > > > > >>>>>>>> > > > > >>>>>>> idempotent > > > > >>>>> > > > > >>>>>> operations etc. Furthermore the exactly once processing mode, > > > > >>>>>>>> > > > > >>>>>>> which > > > > >>> > > > > >>>> is > > > > >>>>> > > > > >>>>>> a > > > > >>>>>> > > > > >>>>>>> checkpoint performed every window is rarely used. All this > > leads > > > > >>>>>>>> > > > > >>>>>>> to > > > > >>> > > > > >>>> confusion especially to somebody new and also makes it difficult > > > > >>>>>>>> > > > > >>>>>>> to > > > > >>> > > > > >>>> explain > > > > >>>>>>> > > > > >>>>>>>> these names to less technical audience in meetups and public > > > > >>>>>>>> > > > > >>>>>>> forums. > > > > >>>> > > > > >>>>> What I am proposing is only a name change which will make this > > > > >>>>>>>> > > > > >>>>>>> more > > > > >>> > > > > >>>> intuitive to understand. Something simple like "repeat" for "at > > > > >>>>>>>> > > > > >>>>>>> least > > > > >>>> > > > > >>>>> once", "latest" for "at most once" and "repeat latest" for > > > > >>>>>>>> > > > > >>>>>>> "exactly > > > > >>> > > > > >>>> once" > > > > >>>>>> > > > > >>>>>>> can do the trick. > > > > >>>>>>>> > > > > >>>>>>>> Thanks > > > > >>>>>>>> > > > > >>>>>>>> > > > > > > > > > > > > > > >
