I vote to keep original names and educate/explain their meaning to non technical audience as delivery guarantee is not specific to Apex, but has common meaning for all streaming platforms.

Vlad

On 2/2/16 15:17, Timothy Farkas wrote:
Could we provide Processing and Output Centric Aliases for the
ProcessingModes?

ProcessingMode.AT_MOST_ONCE_OUTPUT = ProcessingMode.AT_MOST_ONCE
ProcessingMode.EXACTLY_ONCE_OUTPUT = ProcessingMode.AT_LEAST_ONCE

ProcessingMode.AT_MOST_ONCE_PROCESSING = ProcessingMode.AT_MOST_ONCE
ProcessingMode.AT_LEAST_ONCE_PROCESSING = ProcessingMode.AT_LEAST_ONCE
ProcessingMode.EXACTLY_ONCE_PROCESSING = ProcessingMode.EXACTLY_ONCE

Tim

On Tue, Feb 2, 2016 at 3:00 PM, Pramod Immaneni <[email protected]>
wrote:

Well output guarantees are managed by the operators themselves so the user
will typically not see that as part of the engine features, they only see
processing guarantees and while they are technically correct as far as
individual operators are concerned the names give a different idea.

Thanks

On Tue, Feb 2, 2016 at 2:53 PM, Timothy Farkas <[email protected]>
wrote:

I think I understand the ambiguity you are trying to clear up Pramod.
Perhaps it can be disambiguated by distinguishing between Processing
Guarantees and Output Guarantees, when explaining to people. Processing
Guarantees apply to the way tuples are transmitted between operators.
Output Guarantees apply to the way output operators write tuples to a
Data
Sink.

This way we can describe each term intuitively in each context:

At Most Once: A tuple can be dropped or transmitted (written) only once.
At Least Once: A tuple can be transmitted (written) one or more times.
Exactly Once: A tuple is transmitted (written) only once.

Then we could provide a table with the strongest Output Guarantee that is
possible for each Processing Guarantee.

Processing          |   Strongest Output Guarantee
----------------------------------------------
At Most Once      | At Most Once
At Least Once     | Exactly Once
Exactly Once      |  Exactly Once

Thoughts?

Thanks,
Tim

On Tue, Feb 2, 2016 at 2:25 PM, Sandesh Hegde <[email protected]>
wrote:

I agree with Tim. Instead of new terminologies, better explanation for
the
existing once are more useful.

On Tue, Feb 2, 2016 at 2:23 PM Pramod Immaneni <[email protected]
wrote:

The idea is to disambiguate without using at least once since exactly
once
output can still be achieved with those. Any other names are fine,
those
were just suggestions.

On Tue, Feb 2, 2016 at 2:10 PM, Timothy Farkas <[email protected]>
wrote:

The new names don't make as much sense to me as the original names.
The
concepts require some thought to understand, and it won't
necessarily
be
made easier with a name change. I think a better way to attack
misunderstandings is to clearly explain what a window, operator,
input
operator, output operator, tuple, checkpoint, and DAG is with
really
clean
and simple illustrations of the concepts. Then we can explain more
involved
concepts like At Least Once, At Most Once, and Exactly Once with
well
thought illustrations. Without a clear explanation of the basic
vocabulary,
and without pictures, it is difficult to get even technical people
to
understand these concepts.

Thanks,
Tim

On Tue, Feb 2, 2016 at 9:13 AM, Pramod Immaneni <
[email protected]>
wrote:

Today we support three different processing modes for operators,
"at
least
once", "at most once" and "exactly once" which determine tuple
processing
and recovery behavior when there is operator recovery from
failure.
The
default being at least once where the tuples are replayed from
the
recovered checkpoint.

At least once works well for most applications. Typically
applications
persist the final output of processing through the DAG into
various
outputs
like key value stores, databases or even HDFS files. In many of
these
cases
various strategies can be employed to save the data "exactly
once"
in
the
output, such as transactions, rewinding, meta data storage,
idempotent
operations etc. Furthermore the exactly once processing mode,
which
is
a
checkpoint performed every window is rarely used. All this leads
to
confusion especially to somebody new and also makes it difficult
to
explain
these names to less technical audience in meetups and public
forums.
What I am proposing is only a name change which will make this
more
intuitive to understand. Something simple like "repeat" for "at
least
once", "latest" for "at most once" and "repeat latest" for
"exactly
once"
can do the trick.

Thanks


Reply via email to