The new names don't make as much sense to me as the original names. The
concepts require some thought to understand, and it won't necessarily be
made easier with a name change. I think a better way to attack
misunderstandings is to clearly explain what a window, operator, input
operator, output operator, tuple, checkpoint, and DAG is with really clean
and simple illustrations of the concepts. Then we can explain more involved
concepts like At Least Once, At Most Once, and Exactly Once with well
thought illustrations. Without a clear explanation of the basic vocabulary,
and without pictures, it is difficult to get even technical people to
understand these concepts.

Thanks,
Tim

On Tue, Feb 2, 2016 at 9:13 AM, Pramod Immaneni <[email protected]>
wrote:

> Today we support three different processing modes for operators, "at least
> once", "at most once" and "exactly once" which determine tuple processing
> and recovery behavior when there is operator recovery from failure. The
> default being at least once where the tuples are replayed from the
> recovered checkpoint.
>
> At least once works well for most applications. Typically applications
> persist the final output of processing through the DAG into various outputs
> like key value stores, databases or even HDFS files. In many of these cases
> various strategies can be employed to save the data "exactly once" in the
> output, such as transactions, rewinding, meta data storage, idempotent
> operations etc. Furthermore the exactly once processing mode, which is a
> checkpoint performed every window is rarely used. All this leads to
> confusion especially to somebody new and also makes it difficult to explain
> these names to less technical audience in meetups and public forums.
>
> What I am proposing is only a name change which will make this more
> intuitive to understand. Something simple like "repeat" for "at least
> once", "latest" for "at most once" and "repeat latest" for "exactly once"
> can do the trick.
>
> Thanks
>

Reply via email to