Agreed on sticking to standard terminology and explaining details. A deep
technical blog plus a section on this topic in Apex doc would work.

Thks,
Amol


On Tue, Feb 2, 2016 at 10:51 PM, Thomas Weise <[email protected]>
wrote:

> We should stick with standard terminology but make sure the differences are
> well explained. That's necessary because other platforms use the same words
> with different meaning, compare Storm, Spark Streaming and Flink.
>
> Take "exactly once" as example. Elsewhere you will find it claimed when it
> really is "at least once". Events are replayed and computation repeated.
> When all operations in the overall system are idempotent, then it is
> possible to avoid effects such as double counting, duplicate web service
> calls or rows in the database etc. Hence, the engine cannot claim to
> support "exactly once", this is only valid when operators used in the
> application collectively support it.
>
> In Apex, the engine provides the hooks (endWindow, committed) to achieve
> idempotency in operators that have an effect on external systems. There are
> several implementations of operators that can be used with at-least-once
> processing mode that will deliver "exactly-once" for the application when
> all operations in the DAG are idempotent.
>
>
>
>
>
> On Tue, Feb 2, 2016 at 10:26 PM, Shubham Pathak <[email protected]>
> wrote:
>
> > +1 for adding detailed explanation about the concepts in tutorials.
> >
> >
> > On Wed, Feb 3, 2016 at 11:30 AM, Chinmay Kolhatkar <
> > [email protected]>
> > wrote:
> >
> > > +1 for Vlad's suggestion. Searching for keywords like "at least once",
> > "at
> > > most once" and "exactly once" tells that these terminologies are are
> > widely
> > > popular where semantics are defined for tuple processing.
> > > Adding example applications for each of them would help in educating
> the
> > > terminologies in Apex context.
> > >
> > > On Wed, Feb 3, 2016 at 8:52 AM, Chanchal Singh <
> > [email protected]
> > > >
> > > wrote:
> > >
> > > > I do agree with Vlad. it will be good to have good explanation with
> > > example
> > > > for existing names as it will be not create confusion for those who
> > > already
> > > > knows it and also for those who are beginners.
> > > >
> > > > On Wed, Feb 3, 2016 at 8:38 AM, Amol Kekre <[email protected]>
> > wrote:
> > > >
> > > > > I agree with Vlad too.
> > > > >
> > > > > Thks
> > > > > Amol
> > > > >
> > > > >
> > > > > On Tue, Feb 2, 2016 at 3:33 PM, Munagala Ramanath <
> > [email protected]
> > > >
> > > > > wrote:
> > > > >
> > > > > > I agree with Vlad: these names are so deeply embedded in the
> > > community
> > > > > that
> > > > > > changing them is likely
> > > > > > to create more problems than it solves.
> > > > > >
> > > > > > Ram
> > > > > >
> > > > > > On Tue, Feb 2, 2016 at 3:29 PM, Vlad Rozov <
> > [email protected]>
> > > > > > wrote:
> > > > > >
> > > > > > > I vote to keep original names and educate/explain their meaning
> > to
> > > > non
> > > > > > > technical audience as delivery guarantee is not specific to
> Apex,
> > > but
> > > > > has
> > > > > > > common meaning for all streaming platforms.
> > > > > > >
> > > > > > > Vlad
> > > > > > >
> > > > > > >
> > > > > > > On 2/2/16 15:17, Timothy Farkas wrote:
> > > > > > >
> > > > > > >> Could we provide Processing and Output Centric Aliases for the
> > > > > > >> ProcessingModes?
> > > > > > >>
> > > > > > >> ProcessingMode.AT_MOST_ONCE_OUTPUT =
> ProcessingMode.AT_MOST_ONCE
> > > > > > >> ProcessingMode.EXACTLY_ONCE_OUTPUT =
> > ProcessingMode.AT_LEAST_ONCE
> > > > > > >>
> > > > > > >> ProcessingMode.AT_MOST_ONCE_PROCESSING =
> > > ProcessingMode.AT_MOST_ONCE
> > > > > > >> ProcessingMode.AT_LEAST_ONCE_PROCESSING =
> > > > ProcessingMode.AT_LEAST_ONCE
> > > > > > >> ProcessingMode.EXACTLY_ONCE_PROCESSING =
> > > ProcessingMode.EXACTLY_ONCE
> > > > > > >>
> > > > > > >> Tim
> > > > > > >>
> > > > > > >> On Tue, Feb 2, 2016 at 3:00 PM, Pramod Immaneni <
> > > > > [email protected]
> > > > > > >
> > > > > > >> wrote:
> > > > > > >>
> > > > > > >> Well output guarantees are managed by the operators themselves
> > so
> > > > the
> > > > > > user
> > > > > > >>> will typically not see that as part of the engine features,
> > they
> > > > only
> > > > > > see
> > > > > > >>> processing guarantees and while they are technically correct
> as
> > > far
> > > > > as
> > > > > > >>> individual operators are concerned the names give a different
> > > idea.
> > > > > > >>>
> > > > > > >>> Thanks
> > > > > > >>>
> > > > > > >>> On Tue, Feb 2, 2016 at 2:53 PM, Timothy Farkas <
> > > > [email protected]>
> > > > > > >>> wrote:
> > > > > > >>>
> > > > > > >>> I think I understand the ambiguity you are trying to clear up
> > > > Pramod.
> > > > > > >>>> Perhaps it can be disambiguated by distinguishing between
> > > > Processing
> > > > > > >>>> Guarantees and Output Guarantees, when explaining to people.
> > > > > > Processing
> > > > > > >>>> Guarantees apply to the way tuples are transmitted between
> > > > > operators.
> > > > > > >>>> Output Guarantees apply to the way output operators write
> > tuples
> > > > to
> > > > > a
> > > > > > >>>>
> > > > > > >>> Data
> > > > > > >>>
> > > > > > >>>> Sink.
> > > > > > >>>>
> > > > > > >>>> This way we can describe each term intuitively in each
> > context:
> > > > > > >>>>
> > > > > > >>>> At Most Once: A tuple can be dropped or transmitted
> (written)
> > > only
> > > > > > once.
> > > > > > >>>> At Least Once: A tuple can be transmitted (written) one or
> > more
> > > > > times.
> > > > > > >>>> Exactly Once: A tuple is transmitted (written) only once.
> > > > > > >>>>
> > > > > > >>>> Then we could provide a table with the strongest Output
> > > Guarantee
> > > > > that
> > > > > > >>>> is
> > > > > > >>>> possible for each Processing Guarantee.
> > > > > > >>>>
> > > > > > >>>> Processing          |   Strongest Output Guarantee
> > > > > > >>>> ----------------------------------------------
> > > > > > >>>> At Most Once      | At Most Once
> > > > > > >>>> At Least Once     | Exactly Once
> > > > > > >>>> Exactly Once      |  Exactly Once
> > > > > > >>>>
> > > > > > >>>> Thoughts?
> > > > > > >>>>
> > > > > > >>>> Thanks,
> > > > > > >>>> Tim
> > > > > > >>>>
> > > > > > >>>> On Tue, Feb 2, 2016 at 2:25 PM, Sandesh Hegde <
> > > > > > [email protected]>
> > > > > > >>>> wrote:
> > > > > > >>>>
> > > > > > >>>> I agree with Tim. Instead of new terminologies, better
> > > explanation
> > > > > for
> > > > > > >>>>>
> > > > > > >>>> the
> > > > > > >>>>
> > > > > > >>>>> existing once are more useful.
> > > > > > >>>>>
> > > > > > >>>>> On Tue, Feb 2, 2016 at 2:23 PM Pramod Immaneni <
> > > > > > [email protected]
> > > > > > >>>>> wrote:
> > > > > > >>>>>
> > > > > > >>>>> The idea is to disambiguate without using at least once
> since
> > > > > exactly
> > > > > > >>>>>>
> > > > > > >>>>> once
> > > > > > >>>>>
> > > > > > >>>>>> output can still be achieved with those. Any other names
> are
> > > > fine,
> > > > > > >>>>>>
> > > > > > >>>>> those
> > > > > > >>>>
> > > > > > >>>>> were just suggestions.
> > > > > > >>>>>>
> > > > > > >>>>>> On Tue, Feb 2, 2016 at 2:10 PM, Timothy Farkas <
> > > > > [email protected]
> > > > > > >
> > > > > > >>>>>> wrote:
> > > > > > >>>>>>
> > > > > > >>>>>> The new names don't make as much sense to me as the
> original
> > > > > names.
> > > > > > >>>>>>>
> > > > > > >>>>>> The
> > > > > > >>>>
> > > > > > >>>>> concepts require some thought to understand, and it won't
> > > > > > >>>>>>>
> > > > > > >>>>>> necessarily
> > > > > > >>>
> > > > > > >>>> be
> > > > > > >>>>>
> > > > > > >>>>>> made easier with a name change. I think a better way to
> > attack
> > > > > > >>>>>>> misunderstandings is to clearly explain what a window,
> > > > operator,
> > > > > > >>>>>>>
> > > > > > >>>>>> input
> > > > > > >>>>
> > > > > > >>>>> operator, output operator, tuple, checkpoint, and DAG is
> with
> > > > > > >>>>>>>
> > > > > > >>>>>> really
> > > > > > >>>
> > > > > > >>>> clean
> > > > > > >>>>>>
> > > > > > >>>>>>> and simple illustrations of the concepts. Then we can
> > explain
> > > > > more
> > > > > > >>>>>>>
> > > > > > >>>>>> involved
> > > > > > >>>>>>
> > > > > > >>>>>>> concepts like At Least Once, At Most Once, and Exactly
> Once
> > > > with
> > > > > > >>>>>>>
> > > > > > >>>>>> well
> > > > > > >>>
> > > > > > >>>> thought illustrations. Without a clear explanation of the
> > basic
> > > > > > >>>>>>>
> > > > > > >>>>>> vocabulary,
> > > > > > >>>>>>
> > > > > > >>>>>>> and without pictures, it is difficult to get even
> technical
> > > > > people
> > > > > > >>>>>>>
> > > > > > >>>>>> to
> > > > > > >>>
> > > > > > >>>> understand these concepts.
> > > > > > >>>>>>>
> > > > > > >>>>>>> Thanks,
> > > > > > >>>>>>> Tim
> > > > > > >>>>>>>
> > > > > > >>>>>>> On Tue, Feb 2, 2016 at 9:13 AM, Pramod Immaneni <
> > > > > > >>>>>>>
> > > > > > >>>>>> [email protected]>
> > > > > > >>>>>
> > > > > > >>>>>> wrote:
> > > > > > >>>>>>>
> > > > > > >>>>>>> Today we support three different processing modes for
> > > > operators,
> > > > > > >>>>>>>>
> > > > > > >>>>>>> "at
> > > > > > >>>>
> > > > > > >>>>> least
> > > > > > >>>>>>>
> > > > > > >>>>>>>> once", "at most once" and "exactly once" which determine
> > > tuple
> > > > > > >>>>>>>>
> > > > > > >>>>>>> processing
> > > > > > >>>>>>
> > > > > > >>>>>>> and recovery behavior when there is operator recovery
> from
> > > > > > >>>>>>>>
> > > > > > >>>>>>> failure.
> > > > > > >>>
> > > > > > >>>> The
> > > > > > >>>>>
> > > > > > >>>>>> default being at least once where the tuples are replayed
> > from
> > > > > > >>>>>>>>
> > > > > > >>>>>>> the
> > > > > > >>>
> > > > > > >>>> recovered checkpoint.
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> At least once works well for most applications.
> Typically
> > > > > > >>>>>>>>
> > > > > > >>>>>>> applications
> > > > > > >>>>>
> > > > > > >>>>>> persist the final output of processing through the DAG
> into
> > > > > > >>>>>>>>
> > > > > > >>>>>>> various
> > > > > > >>>
> > > > > > >>>> outputs
> > > > > > >>>>>>>
> > > > > > >>>>>>>> like key value stores, databases or even HDFS files. In
> > many
> > > > of
> > > > > > >>>>>>>>
> > > > > > >>>>>>> these
> > > > > > >>>>
> > > > > > >>>>> cases
> > > > > > >>>>>>>
> > > > > > >>>>>>>> various strategies can be employed to save the data
> > "exactly
> > > > > > >>>>>>>>
> > > > > > >>>>>>> once"
> > > > > > >>>
> > > > > > >>>> in
> > > > > > >>>>
> > > > > > >>>>> the
> > > > > > >>>>>>
> > > > > > >>>>>>> output, such as transactions, rewinding, meta data
> storage,
> > > > > > >>>>>>>>
> > > > > > >>>>>>> idempotent
> > > > > > >>>>>
> > > > > > >>>>>> operations etc. Furthermore the exactly once processing
> > mode,
> > > > > > >>>>>>>>
> > > > > > >>>>>>> which
> > > > > > >>>
> > > > > > >>>> is
> > > > > > >>>>>
> > > > > > >>>>>> a
> > > > > > >>>>>>
> > > > > > >>>>>>> checkpoint performed every window is rarely used. All
> this
> > > > leads
> > > > > > >>>>>>>>
> > > > > > >>>>>>> to
> > > > > > >>>
> > > > > > >>>> confusion especially to somebody new and also makes it
> > difficult
> > > > > > >>>>>>>>
> > > > > > >>>>>>> to
> > > > > > >>>
> > > > > > >>>> explain
> > > > > > >>>>>>>
> > > > > > >>>>>>>> these names to less technical audience in meetups and
> > public
> > > > > > >>>>>>>>
> > > > > > >>>>>>> forums.
> > > > > > >>>>
> > > > > > >>>>> What I am proposing is only a name change which will make
> > this
> > > > > > >>>>>>>>
> > > > > > >>>>>>> more
> > > > > > >>>
> > > > > > >>>> intuitive to understand. Something simple like "repeat" for
> > "at
> > > > > > >>>>>>>>
> > > > > > >>>>>>> least
> > > > > > >>>>
> > > > > > >>>>> once", "latest" for "at most once" and "repeat latest" for
> > > > > > >>>>>>>>
> > > > > > >>>>>>> "exactly
> > > > > > >>>
> > > > > > >>>> once"
> > > > > > >>>>>>
> > > > > > >>>>>>> can do the trick.
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> Thanks
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to