The new names don't make as much sense to me as the original names. The concepts require some thought to understand, and it won't necessarily be made easier with a name change. I think a better way to attack misunderstandings is to clearly explain what a window, operator, input operator, output operator, tuple, checkpoint, and DAG is with really clean and simple illustrations of the concepts. Then we can explain more involved concepts like At Least Once, At Most Once, and Exactly Once with well thought illustrations. Without a clear explanation of the basic vocabulary, and without pictures, it is difficult to get even technical people to understand these concepts.
Thanks, Tim On Tue, Feb 2, 2016 at 9:13 AM, Pramod Immaneni <[email protected]> wrote: > Today we support three different processing modes for operators, "at least > once", "at most once" and "exactly once" which determine tuple processing > and recovery behavior when there is operator recovery from failure. The > default being at least once where the tuples are replayed from the > recovered checkpoint. > > At least once works well for most applications. Typically applications > persist the final output of processing through the DAG into various outputs > like key value stores, databases or even HDFS files. In many of these cases > various strategies can be employed to save the data "exactly once" in the > output, such as transactions, rewinding, meta data storage, idempotent > operations etc. Furthermore the exactly once processing mode, which is a > checkpoint performed every window is rarely used. All this leads to > confusion especially to somebody new and also makes it difficult to explain > these names to less technical audience in meetups and public forums. > > What I am proposing is only a name change which will make this more > intuitive to understand. Something simple like "repeat" for "at least > once", "latest" for "at most once" and "repeat latest" for "exactly once" > can do the trick. > > Thanks >
