I think I understand the ambiguity you are trying to clear up Pramod.
Perhaps it can be disambiguated by distinguishing between Processing
Guarantees and Output Guarantees, when explaining to people. Processing
Guarantees apply to the way tuples are transmitted between operators.
Output Guarantees apply to the way output operators write tuples to a Data
Sink.

This way we can describe each term intuitively in each context:

At Most Once: A tuple can be dropped or transmitted (written) only once.
At Least Once: A tuple can be transmitted (written) one or more times.
Exactly Once: A tuple is transmitted (written) only once.

Then we could provide a table with the strongest Output Guarantee that is
possible for each Processing Guarantee.

Processing          |   Strongest Output Guarantee
----------------------------------------------
At Most Once      | At Most Once
At Least Once     | Exactly Once
Exactly Once      |  Exactly Once

Thoughts?

Thanks,
Tim

On Tue, Feb 2, 2016 at 2:25 PM, Sandesh Hegde <[email protected]>
wrote:

> I agree with Tim. Instead of new terminologies, better explanation for the
> existing once are more useful.
>
> On Tue, Feb 2, 2016 at 2:23 PM Pramod Immaneni <[email protected]>
> wrote:
>
> > The idea is to disambiguate without using at least once since exactly
> once
> > output can still be achieved with those. Any other names are fine, those
> > were just suggestions.
> >
> > On Tue, Feb 2, 2016 at 2:10 PM, Timothy Farkas <[email protected]>
> > wrote:
> >
> > > The new names don't make as much sense to me as the original names. The
> > > concepts require some thought to understand, and it won't necessarily
> be
> > > made easier with a name change. I think a better way to attack
> > > misunderstandings is to clearly explain what a window, operator, input
> > > operator, output operator, tuple, checkpoint, and DAG is with really
> > clean
> > > and simple illustrations of the concepts. Then we can explain more
> > involved
> > > concepts like At Least Once, At Most Once, and Exactly Once with well
> > > thought illustrations. Without a clear explanation of the basic
> > vocabulary,
> > > and without pictures, it is difficult to get even technical people to
> > > understand these concepts.
> > >
> > > Thanks,
> > > Tim
> > >
> > > On Tue, Feb 2, 2016 at 9:13 AM, Pramod Immaneni <
> [email protected]>
> > > wrote:
> > >
> > > > Today we support three different processing modes for operators, "at
> > > least
> > > > once", "at most once" and "exactly once" which determine tuple
> > processing
> > > > and recovery behavior when there is operator recovery from failure.
> The
> > > > default being at least once where the tuples are replayed from the
> > > > recovered checkpoint.
> > > >
> > > > At least once works well for most applications. Typically
> applications
> > > > persist the final output of processing through the DAG into various
> > > outputs
> > > > like key value stores, databases or even HDFS files. In many of these
> > > cases
> > > > various strategies can be employed to save the data "exactly once" in
> > the
> > > > output, such as transactions, rewinding, meta data storage,
> idempotent
> > > > operations etc. Furthermore the exactly once processing mode, which
> is
> > a
> > > > checkpoint performed every window is rarely used. All this leads to
> > > > confusion especially to somebody new and also makes it difficult to
> > > explain
> > > > these names to less technical audience in meetups and public forums.
> > > >
> > > > What I am proposing is only a name change which will make this more
> > > > intuitive to understand. Something simple like "repeat" for "at least
> > > > once", "latest" for "at most once" and "repeat latest" for "exactly
> > once"
> > > > can do the trick.
> > > >
> > > > Thanks
> > > >
> > >
> >
>

Reply via email to