We should stick with standard terminology but make sure the differences are
well explained. That's necessary because other platforms use the same words
with different meaning, compare Storm, Spark Streaming and Flink.

Take "exactly once" as example. Elsewhere you will find it claimed when it
really is "at least once". Events are replayed and computation repeated.
When all operations in the overall system are idempotent, then it is
possible to avoid effects such as double counting, duplicate web service
calls or rows in the database etc. Hence, the engine cannot claim to
support "exactly once", this is only valid when operators used in the
application collectively support it.

In Apex, the engine provides the hooks (endWindow, committed) to achieve
idempotency in operators that have an effect on external systems. There are
several implementations of operators that can be used with at-least-once
processing mode that will deliver "exactly-once" for the application when
all operations in the DAG are idempotent.





On Tue, Feb 2, 2016 at 10:26 PM, Shubham Pathak <[email protected]>
wrote:

> +1 for adding detailed explanation about the concepts in tutorials.
>
>
> On Wed, Feb 3, 2016 at 11:30 AM, Chinmay Kolhatkar <
> [email protected]>
> wrote:
>
> > +1 for Vlad's suggestion. Searching for keywords like "at least once",
> "at
> > most once" and "exactly once" tells that these terminologies are are
> widely
> > popular where semantics are defined for tuple processing.
> > Adding example applications for each of them would help in educating the
> > terminologies in Apex context.
> >
> > On Wed, Feb 3, 2016 at 8:52 AM, Chanchal Singh <
> [email protected]
> > >
> > wrote:
> >
> > > I do agree with Vlad. it will be good to have good explanation with
> > example
> > > for existing names as it will be not create confusion for those who
> > already
> > > knows it and also for those who are beginners.
> > >
> > > On Wed, Feb 3, 2016 at 8:38 AM, Amol Kekre <[email protected]>
> wrote:
> > >
> > > > I agree with Vlad too.
> > > >
> > > > Thks
> > > > Amol
> > > >
> > > >
> > > > On Tue, Feb 2, 2016 at 3:33 PM, Munagala Ramanath <
> [email protected]
> > >
> > > > wrote:
> > > >
> > > > > I agree with Vlad: these names are so deeply embedded in the
> > community
> > > > that
> > > > > changing them is likely
> > > > > to create more problems than it solves.
> > > > >
> > > > > Ram
> > > > >
> > > > > On Tue, Feb 2, 2016 at 3:29 PM, Vlad Rozov <
> [email protected]>
> > > > > wrote:
> > > > >
> > > > > > I vote to keep original names and educate/explain their meaning
> to
> > > non
> > > > > > technical audience as delivery guarantee is not specific to Apex,
> > but
> > > > has
> > > > > > common meaning for all streaming platforms.
> > > > > >
> > > > > > Vlad
> > > > > >
> > > > > >
> > > > > > On 2/2/16 15:17, Timothy Farkas wrote:
> > > > > >
> > > > > >> Could we provide Processing and Output Centric Aliases for the
> > > > > >> ProcessingModes?
> > > > > >>
> > > > > >> ProcessingMode.AT_MOST_ONCE_OUTPUT = ProcessingMode.AT_MOST_ONCE
> > > > > >> ProcessingMode.EXACTLY_ONCE_OUTPUT =
> ProcessingMode.AT_LEAST_ONCE
> > > > > >>
> > > > > >> ProcessingMode.AT_MOST_ONCE_PROCESSING =
> > ProcessingMode.AT_MOST_ONCE
> > > > > >> ProcessingMode.AT_LEAST_ONCE_PROCESSING =
> > > ProcessingMode.AT_LEAST_ONCE
> > > > > >> ProcessingMode.EXACTLY_ONCE_PROCESSING =
> > ProcessingMode.EXACTLY_ONCE
> > > > > >>
> > > > > >> Tim
> > > > > >>
> > > > > >> On Tue, Feb 2, 2016 at 3:00 PM, Pramod Immaneni <
> > > > [email protected]
> > > > > >
> > > > > >> wrote:
> > > > > >>
> > > > > >> Well output guarantees are managed by the operators themselves
> so
> > > the
> > > > > user
> > > > > >>> will typically not see that as part of the engine features,
> they
> > > only
> > > > > see
> > > > > >>> processing guarantees and while they are technically correct as
> > far
> > > > as
> > > > > >>> individual operators are concerned the names give a different
> > idea.
> > > > > >>>
> > > > > >>> Thanks
> > > > > >>>
> > > > > >>> On Tue, Feb 2, 2016 at 2:53 PM, Timothy Farkas <
> > > [email protected]>
> > > > > >>> wrote:
> > > > > >>>
> > > > > >>> I think I understand the ambiguity you are trying to clear up
> > > Pramod.
> > > > > >>>> Perhaps it can be disambiguated by distinguishing between
> > > Processing
> > > > > >>>> Guarantees and Output Guarantees, when explaining to people.
> > > > > Processing
> > > > > >>>> Guarantees apply to the way tuples are transmitted between
> > > > operators.
> > > > > >>>> Output Guarantees apply to the way output operators write
> tuples
> > > to
> > > > a
> > > > > >>>>
> > > > > >>> Data
> > > > > >>>
> > > > > >>>> Sink.
> > > > > >>>>
> > > > > >>>> This way we can describe each term intuitively in each
> context:
> > > > > >>>>
> > > > > >>>> At Most Once: A tuple can be dropped or transmitted (written)
> > only
> > > > > once.
> > > > > >>>> At Least Once: A tuple can be transmitted (written) one or
> more
> > > > times.
> > > > > >>>> Exactly Once: A tuple is transmitted (written) only once.
> > > > > >>>>
> > > > > >>>> Then we could provide a table with the strongest Output
> > Guarantee
> > > > that
> > > > > >>>> is
> > > > > >>>> possible for each Processing Guarantee.
> > > > > >>>>
> > > > > >>>> Processing          |   Strongest Output Guarantee
> > > > > >>>> ----------------------------------------------
> > > > > >>>> At Most Once      | At Most Once
> > > > > >>>> At Least Once     | Exactly Once
> > > > > >>>> Exactly Once      |  Exactly Once
> > > > > >>>>
> > > > > >>>> Thoughts?
> > > > > >>>>
> > > > > >>>> Thanks,
> > > > > >>>> Tim
> > > > > >>>>
> > > > > >>>> On Tue, Feb 2, 2016 at 2:25 PM, Sandesh Hegde <
> > > > > [email protected]>
> > > > > >>>> wrote:
> > > > > >>>>
> > > > > >>>> I agree with Tim. Instead of new terminologies, better
> > explanation
> > > > for
> > > > > >>>>>
> > > > > >>>> the
> > > > > >>>>
> > > > > >>>>> existing once are more useful.
> > > > > >>>>>
> > > > > >>>>> On Tue, Feb 2, 2016 at 2:23 PM Pramod Immaneni <
> > > > > [email protected]
> > > > > >>>>> wrote:
> > > > > >>>>>
> > > > > >>>>> The idea is to disambiguate without using at least once since
> > > > exactly
> > > > > >>>>>>
> > > > > >>>>> once
> > > > > >>>>>
> > > > > >>>>>> output can still be achieved with those. Any other names are
> > > fine,
> > > > > >>>>>>
> > > > > >>>>> those
> > > > > >>>>
> > > > > >>>>> were just suggestions.
> > > > > >>>>>>
> > > > > >>>>>> On Tue, Feb 2, 2016 at 2:10 PM, Timothy Farkas <
> > > > [email protected]
> > > > > >
> > > > > >>>>>> wrote:
> > > > > >>>>>>
> > > > > >>>>>> The new names don't make as much sense to me as the original
> > > > names.
> > > > > >>>>>>>
> > > > > >>>>>> The
> > > > > >>>>
> > > > > >>>>> concepts require some thought to understand, and it won't
> > > > > >>>>>>>
> > > > > >>>>>> necessarily
> > > > > >>>
> > > > > >>>> be
> > > > > >>>>>
> > > > > >>>>>> made easier with a name change. I think a better way to
> attack
> > > > > >>>>>>> misunderstandings is to clearly explain what a window,
> > > operator,
> > > > > >>>>>>>
> > > > > >>>>>> input
> > > > > >>>>
> > > > > >>>>> operator, output operator, tuple, checkpoint, and DAG is with
> > > > > >>>>>>>
> > > > > >>>>>> really
> > > > > >>>
> > > > > >>>> clean
> > > > > >>>>>>
> > > > > >>>>>>> and simple illustrations of the concepts. Then we can
> explain
> > > > more
> > > > > >>>>>>>
> > > > > >>>>>> involved
> > > > > >>>>>>
> > > > > >>>>>>> concepts like At Least Once, At Most Once, and Exactly Once
> > > with
> > > > > >>>>>>>
> > > > > >>>>>> well
> > > > > >>>
> > > > > >>>> thought illustrations. Without a clear explanation of the
> basic
> > > > > >>>>>>>
> > > > > >>>>>> vocabulary,
> > > > > >>>>>>
> > > > > >>>>>>> and without pictures, it is difficult to get even technical
> > > > people
> > > > > >>>>>>>
> > > > > >>>>>> to
> > > > > >>>
> > > > > >>>> understand these concepts.
> > > > > >>>>>>>
> > > > > >>>>>>> Thanks,
> > > > > >>>>>>> Tim
> > > > > >>>>>>>
> > > > > >>>>>>> On Tue, Feb 2, 2016 at 9:13 AM, Pramod Immaneni <
> > > > > >>>>>>>
> > > > > >>>>>> [email protected]>
> > > > > >>>>>
> > > > > >>>>>> wrote:
> > > > > >>>>>>>
> > > > > >>>>>>> Today we support three different processing modes for
> > > operators,
> > > > > >>>>>>>>
> > > > > >>>>>>> "at
> > > > > >>>>
> > > > > >>>>> least
> > > > > >>>>>>>
> > > > > >>>>>>>> once", "at most once" and "exactly once" which determine
> > tuple
> > > > > >>>>>>>>
> > > > > >>>>>>> processing
> > > > > >>>>>>
> > > > > >>>>>>> and recovery behavior when there is operator recovery from
> > > > > >>>>>>>>
> > > > > >>>>>>> failure.
> > > > > >>>
> > > > > >>>> The
> > > > > >>>>>
> > > > > >>>>>> default being at least once where the tuples are replayed
> from
> > > > > >>>>>>>>
> > > > > >>>>>>> the
> > > > > >>>
> > > > > >>>> recovered checkpoint.
> > > > > >>>>>>>>
> > > > > >>>>>>>> At least once works well for most applications. Typically
> > > > > >>>>>>>>
> > > > > >>>>>>> applications
> > > > > >>>>>
> > > > > >>>>>> persist the final output of processing through the DAG into
> > > > > >>>>>>>>
> > > > > >>>>>>> various
> > > > > >>>
> > > > > >>>> outputs
> > > > > >>>>>>>
> > > > > >>>>>>>> like key value stores, databases or even HDFS files. In
> many
> > > of
> > > > > >>>>>>>>
> > > > > >>>>>>> these
> > > > > >>>>
> > > > > >>>>> cases
> > > > > >>>>>>>
> > > > > >>>>>>>> various strategies can be employed to save the data
> "exactly
> > > > > >>>>>>>>
> > > > > >>>>>>> once"
> > > > > >>>
> > > > > >>>> in
> > > > > >>>>
> > > > > >>>>> the
> > > > > >>>>>>
> > > > > >>>>>>> output, such as transactions, rewinding, meta data storage,
> > > > > >>>>>>>>
> > > > > >>>>>>> idempotent
> > > > > >>>>>
> > > > > >>>>>> operations etc. Furthermore the exactly once processing
> mode,
> > > > > >>>>>>>>
> > > > > >>>>>>> which
> > > > > >>>
> > > > > >>>> is
> > > > > >>>>>
> > > > > >>>>>> a
> > > > > >>>>>>
> > > > > >>>>>>> checkpoint performed every window is rarely used. All this
> > > leads
> > > > > >>>>>>>>
> > > > > >>>>>>> to
> > > > > >>>
> > > > > >>>> confusion especially to somebody new and also makes it
> difficult
> > > > > >>>>>>>>
> > > > > >>>>>>> to
> > > > > >>>
> > > > > >>>> explain
> > > > > >>>>>>>
> > > > > >>>>>>>> these names to less technical audience in meetups and
> public
> > > > > >>>>>>>>
> > > > > >>>>>>> forums.
> > > > > >>>>
> > > > > >>>>> What I am proposing is only a name change which will make
> this
> > > > > >>>>>>>>
> > > > > >>>>>>> more
> > > > > >>>
> > > > > >>>> intuitive to understand. Something simple like "repeat" for
> "at
> > > > > >>>>>>>>
> > > > > >>>>>>> least
> > > > > >>>>
> > > > > >>>>> once", "latest" for "at most once" and "repeat latest" for
> > > > > >>>>>>>>
> > > > > >>>>>>> "exactly
> > > > > >>>
> > > > > >>>> once"
> > > > > >>>>>>
> > > > > >>>>>>> can do the trick.
> > > > > >>>>>>>>
> > > > > >>>>>>>> Thanks
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to