I agree with Vlad: these names are so deeply embedded in the community that changing them is likely to create more problems than it solves.
Ram On Tue, Feb 2, 2016 at 3:29 PM, Vlad Rozov <[email protected]> wrote: > I vote to keep original names and educate/explain their meaning to non > technical audience as delivery guarantee is not specific to Apex, but has > common meaning for all streaming platforms. > > Vlad > > > On 2/2/16 15:17, Timothy Farkas wrote: > >> Could we provide Processing and Output Centric Aliases for the >> ProcessingModes? >> >> ProcessingMode.AT_MOST_ONCE_OUTPUT = ProcessingMode.AT_MOST_ONCE >> ProcessingMode.EXACTLY_ONCE_OUTPUT = ProcessingMode.AT_LEAST_ONCE >> >> ProcessingMode.AT_MOST_ONCE_PROCESSING = ProcessingMode.AT_MOST_ONCE >> ProcessingMode.AT_LEAST_ONCE_PROCESSING = ProcessingMode.AT_LEAST_ONCE >> ProcessingMode.EXACTLY_ONCE_PROCESSING = ProcessingMode.EXACTLY_ONCE >> >> Tim >> >> On Tue, Feb 2, 2016 at 3:00 PM, Pramod Immaneni <[email protected]> >> wrote: >> >> Well output guarantees are managed by the operators themselves so the user >>> will typically not see that as part of the engine features, they only see >>> processing guarantees and while they are technically correct as far as >>> individual operators are concerned the names give a different idea. >>> >>> Thanks >>> >>> On Tue, Feb 2, 2016 at 2:53 PM, Timothy Farkas <[email protected]> >>> wrote: >>> >>> I think I understand the ambiguity you are trying to clear up Pramod. >>>> Perhaps it can be disambiguated by distinguishing between Processing >>>> Guarantees and Output Guarantees, when explaining to people. Processing >>>> Guarantees apply to the way tuples are transmitted between operators. >>>> Output Guarantees apply to the way output operators write tuples to a >>>> >>> Data >>> >>>> Sink. >>>> >>>> This way we can describe each term intuitively in each context: >>>> >>>> At Most Once: A tuple can be dropped or transmitted (written) only once. >>>> At Least Once: A tuple can be transmitted (written) one or more times. >>>> Exactly Once: A tuple is transmitted (written) only once. >>>> >>>> Then we could provide a table with the strongest Output Guarantee that >>>> is >>>> possible for each Processing Guarantee. >>>> >>>> Processing | Strongest Output Guarantee >>>> ---------------------------------------------- >>>> At Most Once | At Most Once >>>> At Least Once | Exactly Once >>>> Exactly Once | Exactly Once >>>> >>>> Thoughts? >>>> >>>> Thanks, >>>> Tim >>>> >>>> On Tue, Feb 2, 2016 at 2:25 PM, Sandesh Hegde <[email protected]> >>>> wrote: >>>> >>>> I agree with Tim. Instead of new terminologies, better explanation for >>>>> >>>> the >>>> >>>>> existing once are more useful. >>>>> >>>>> On Tue, Feb 2, 2016 at 2:23 PM Pramod Immaneni <[email protected] >>>>> wrote: >>>>> >>>>> The idea is to disambiguate without using at least once since exactly >>>>>> >>>>> once >>>>> >>>>>> output can still be achieved with those. Any other names are fine, >>>>>> >>>>> those >>>> >>>>> were just suggestions. >>>>>> >>>>>> On Tue, Feb 2, 2016 at 2:10 PM, Timothy Farkas <[email protected]> >>>>>> wrote: >>>>>> >>>>>> The new names don't make as much sense to me as the original names. >>>>>>> >>>>>> The >>>> >>>>> concepts require some thought to understand, and it won't >>>>>>> >>>>>> necessarily >>> >>>> be >>>>> >>>>>> made easier with a name change. I think a better way to attack >>>>>>> misunderstandings is to clearly explain what a window, operator, >>>>>>> >>>>>> input >>>> >>>>> operator, output operator, tuple, checkpoint, and DAG is with >>>>>>> >>>>>> really >>> >>>> clean >>>>>> >>>>>>> and simple illustrations of the concepts. Then we can explain more >>>>>>> >>>>>> involved >>>>>> >>>>>>> concepts like At Least Once, At Most Once, and Exactly Once with >>>>>>> >>>>>> well >>> >>>> thought illustrations. Without a clear explanation of the basic >>>>>>> >>>>>> vocabulary, >>>>>> >>>>>>> and without pictures, it is difficult to get even technical people >>>>>>> >>>>>> to >>> >>>> understand these concepts. >>>>>>> >>>>>>> Thanks, >>>>>>> Tim >>>>>>> >>>>>>> On Tue, Feb 2, 2016 at 9:13 AM, Pramod Immaneni < >>>>>>> >>>>>> [email protected]> >>>>> >>>>>> wrote: >>>>>>> >>>>>>> Today we support three different processing modes for operators, >>>>>>>> >>>>>>> "at >>>> >>>>> least >>>>>>> >>>>>>>> once", "at most once" and "exactly once" which determine tuple >>>>>>>> >>>>>>> processing >>>>>> >>>>>>> and recovery behavior when there is operator recovery from >>>>>>>> >>>>>>> failure. >>> >>>> The >>>>> >>>>>> default being at least once where the tuples are replayed from >>>>>>>> >>>>>>> the >>> >>>> recovered checkpoint. >>>>>>>> >>>>>>>> At least once works well for most applications. Typically >>>>>>>> >>>>>>> applications >>>>> >>>>>> persist the final output of processing through the DAG into >>>>>>>> >>>>>>> various >>> >>>> outputs >>>>>>> >>>>>>>> like key value stores, databases or even HDFS files. In many of >>>>>>>> >>>>>>> these >>>> >>>>> cases >>>>>>> >>>>>>>> various strategies can be employed to save the data "exactly >>>>>>>> >>>>>>> once" >>> >>>> in >>>> >>>>> the >>>>>> >>>>>>> output, such as transactions, rewinding, meta data storage, >>>>>>>> >>>>>>> idempotent >>>>> >>>>>> operations etc. Furthermore the exactly once processing mode, >>>>>>>> >>>>>>> which >>> >>>> is >>>>> >>>>>> a >>>>>> >>>>>>> checkpoint performed every window is rarely used. All this leads >>>>>>>> >>>>>>> to >>> >>>> confusion especially to somebody new and also makes it difficult >>>>>>>> >>>>>>> to >>> >>>> explain >>>>>>> >>>>>>>> these names to less technical audience in meetups and public >>>>>>>> >>>>>>> forums. >>>> >>>>> What I am proposing is only a name change which will make this >>>>>>>> >>>>>>> more >>> >>>> intuitive to understand. Something simple like "repeat" for "at >>>>>>>> >>>>>>> least >>>> >>>>> once", "latest" for "at most once" and "repeat latest" for >>>>>>>> >>>>>>> "exactly >>> >>>> once" >>>>>> >>>>>>> can do the trick. >>>>>>>> >>>>>>>> Thanks >>>>>>>> >>>>>>>> >
