I do agree with Vlad. it will be good to have good explanation with example for existing names as it will be not create confusion for those who already knows it and also for those who are beginners.
On Wed, Feb 3, 2016 at 8:38 AM, Amol Kekre <[email protected]> wrote: > I agree with Vlad too. > > Thks > Amol > > > On Tue, Feb 2, 2016 at 3:33 PM, Munagala Ramanath <[email protected]> > wrote: > > > I agree with Vlad: these names are so deeply embedded in the community > that > > changing them is likely > > to create more problems than it solves. > > > > Ram > > > > On Tue, Feb 2, 2016 at 3:29 PM, Vlad Rozov <[email protected]> > > wrote: > > > > > I vote to keep original names and educate/explain their meaning to non > > > technical audience as delivery guarantee is not specific to Apex, but > has > > > common meaning for all streaming platforms. > > > > > > Vlad > > > > > > > > > On 2/2/16 15:17, Timothy Farkas wrote: > > > > > >> Could we provide Processing and Output Centric Aliases for the > > >> ProcessingModes? > > >> > > >> ProcessingMode.AT_MOST_ONCE_OUTPUT = ProcessingMode.AT_MOST_ONCE > > >> ProcessingMode.EXACTLY_ONCE_OUTPUT = ProcessingMode.AT_LEAST_ONCE > > >> > > >> ProcessingMode.AT_MOST_ONCE_PROCESSING = ProcessingMode.AT_MOST_ONCE > > >> ProcessingMode.AT_LEAST_ONCE_PROCESSING = ProcessingMode.AT_LEAST_ONCE > > >> ProcessingMode.EXACTLY_ONCE_PROCESSING = ProcessingMode.EXACTLY_ONCE > > >> > > >> Tim > > >> > > >> On Tue, Feb 2, 2016 at 3:00 PM, Pramod Immaneni < > [email protected] > > > > > >> wrote: > > >> > > >> Well output guarantees are managed by the operators themselves so the > > user > > >>> will typically not see that as part of the engine features, they only > > see > > >>> processing guarantees and while they are technically correct as far > as > > >>> individual operators are concerned the names give a different idea. > > >>> > > >>> Thanks > > >>> > > >>> On Tue, Feb 2, 2016 at 2:53 PM, Timothy Farkas <[email protected]> > > >>> wrote: > > >>> > > >>> I think I understand the ambiguity you are trying to clear up Pramod. > > >>>> Perhaps it can be disambiguated by distinguishing between Processing > > >>>> Guarantees and Output Guarantees, when explaining to people. > > Processing > > >>>> Guarantees apply to the way tuples are transmitted between > operators. > > >>>> Output Guarantees apply to the way output operators write tuples to > a > > >>>> > > >>> Data > > >>> > > >>>> Sink. > > >>>> > > >>>> This way we can describe each term intuitively in each context: > > >>>> > > >>>> At Most Once: A tuple can be dropped or transmitted (written) only > > once. > > >>>> At Least Once: A tuple can be transmitted (written) one or more > times. > > >>>> Exactly Once: A tuple is transmitted (written) only once. > > >>>> > > >>>> Then we could provide a table with the strongest Output Guarantee > that > > >>>> is > > >>>> possible for each Processing Guarantee. > > >>>> > > >>>> Processing | Strongest Output Guarantee > > >>>> ---------------------------------------------- > > >>>> At Most Once | At Most Once > > >>>> At Least Once | Exactly Once > > >>>> Exactly Once | Exactly Once > > >>>> > > >>>> Thoughts? > > >>>> > > >>>> Thanks, > > >>>> Tim > > >>>> > > >>>> On Tue, Feb 2, 2016 at 2:25 PM, Sandesh Hegde < > > [email protected]> > > >>>> wrote: > > >>>> > > >>>> I agree with Tim. Instead of new terminologies, better explanation > for > > >>>>> > > >>>> the > > >>>> > > >>>>> existing once are more useful. > > >>>>> > > >>>>> On Tue, Feb 2, 2016 at 2:23 PM Pramod Immaneni < > > [email protected] > > >>>>> wrote: > > >>>>> > > >>>>> The idea is to disambiguate without using at least once since > exactly > > >>>>>> > > >>>>> once > > >>>>> > > >>>>>> output can still be achieved with those. Any other names are fine, > > >>>>>> > > >>>>> those > > >>>> > > >>>>> were just suggestions. > > >>>>>> > > >>>>>> On Tue, Feb 2, 2016 at 2:10 PM, Timothy Farkas < > [email protected] > > > > > >>>>>> wrote: > > >>>>>> > > >>>>>> The new names don't make as much sense to me as the original > names. > > >>>>>>> > > >>>>>> The > > >>>> > > >>>>> concepts require some thought to understand, and it won't > > >>>>>>> > > >>>>>> necessarily > > >>> > > >>>> be > > >>>>> > > >>>>>> made easier with a name change. I think a better way to attack > > >>>>>>> misunderstandings is to clearly explain what a window, operator, > > >>>>>>> > > >>>>>> input > > >>>> > > >>>>> operator, output operator, tuple, checkpoint, and DAG is with > > >>>>>>> > > >>>>>> really > > >>> > > >>>> clean > > >>>>>> > > >>>>>>> and simple illustrations of the concepts. Then we can explain > more > > >>>>>>> > > >>>>>> involved > > >>>>>> > > >>>>>>> concepts like At Least Once, At Most Once, and Exactly Once with > > >>>>>>> > > >>>>>> well > > >>> > > >>>> thought illustrations. Without a clear explanation of the basic > > >>>>>>> > > >>>>>> vocabulary, > > >>>>>> > > >>>>>>> and without pictures, it is difficult to get even technical > people > > >>>>>>> > > >>>>>> to > > >>> > > >>>> understand these concepts. > > >>>>>>> > > >>>>>>> Thanks, > > >>>>>>> Tim > > >>>>>>> > > >>>>>>> On Tue, Feb 2, 2016 at 9:13 AM, Pramod Immaneni < > > >>>>>>> > > >>>>>> [email protected]> > > >>>>> > > >>>>>> wrote: > > >>>>>>> > > >>>>>>> Today we support three different processing modes for operators, > > >>>>>>>> > > >>>>>>> "at > > >>>> > > >>>>> least > > >>>>>>> > > >>>>>>>> once", "at most once" and "exactly once" which determine tuple > > >>>>>>>> > > >>>>>>> processing > > >>>>>> > > >>>>>>> and recovery behavior when there is operator recovery from > > >>>>>>>> > > >>>>>>> failure. > > >>> > > >>>> The > > >>>>> > > >>>>>> default being at least once where the tuples are replayed from > > >>>>>>>> > > >>>>>>> the > > >>> > > >>>> recovered checkpoint. > > >>>>>>>> > > >>>>>>>> At least once works well for most applications. Typically > > >>>>>>>> > > >>>>>>> applications > > >>>>> > > >>>>>> persist the final output of processing through the DAG into > > >>>>>>>> > > >>>>>>> various > > >>> > > >>>> outputs > > >>>>>>> > > >>>>>>>> like key value stores, databases or even HDFS files. In many of > > >>>>>>>> > > >>>>>>> these > > >>>> > > >>>>> cases > > >>>>>>> > > >>>>>>>> various strategies can be employed to save the data "exactly > > >>>>>>>> > > >>>>>>> once" > > >>> > > >>>> in > > >>>> > > >>>>> the > > >>>>>> > > >>>>>>> output, such as transactions, rewinding, meta data storage, > > >>>>>>>> > > >>>>>>> idempotent > > >>>>> > > >>>>>> operations etc. Furthermore the exactly once processing mode, > > >>>>>>>> > > >>>>>>> which > > >>> > > >>>> is > > >>>>> > > >>>>>> a > > >>>>>> > > >>>>>>> checkpoint performed every window is rarely used. All this leads > > >>>>>>>> > > >>>>>>> to > > >>> > > >>>> confusion especially to somebody new and also makes it difficult > > >>>>>>>> > > >>>>>>> to > > >>> > > >>>> explain > > >>>>>>> > > >>>>>>>> these names to less technical audience in meetups and public > > >>>>>>>> > > >>>>>>> forums. > > >>>> > > >>>>> What I am proposing is only a name change which will make this > > >>>>>>>> > > >>>>>>> more > > >>> > > >>>> intuitive to understand. Something simple like "repeat" for "at > > >>>>>>>> > > >>>>>>> least > > >>>> > > >>>>> once", "latest" for "at most once" and "repeat latest" for > > >>>>>>>> > > >>>>>>> "exactly > > >>> > > >>>> once" > > >>>>>> > > >>>>>>> can do the trick. > > >>>>>>>> > > >>>>>>>> Thanks > > >>>>>>>> > > >>>>>>>> > > > > > >
