I agree with Tim. Instead of new terminologies, better explanation for the existing once are more useful.
On Tue, Feb 2, 2016 at 2:23 PM Pramod Immaneni <[email protected]> wrote: > The idea is to disambiguate without using at least once since exactly once > output can still be achieved with those. Any other names are fine, those > were just suggestions. > > On Tue, Feb 2, 2016 at 2:10 PM, Timothy Farkas <[email protected]> > wrote: > > > The new names don't make as much sense to me as the original names. The > > concepts require some thought to understand, and it won't necessarily be > > made easier with a name change. I think a better way to attack > > misunderstandings is to clearly explain what a window, operator, input > > operator, output operator, tuple, checkpoint, and DAG is with really > clean > > and simple illustrations of the concepts. Then we can explain more > involved > > concepts like At Least Once, At Most Once, and Exactly Once with well > > thought illustrations. Without a clear explanation of the basic > vocabulary, > > and without pictures, it is difficult to get even technical people to > > understand these concepts. > > > > Thanks, > > Tim > > > > On Tue, Feb 2, 2016 at 9:13 AM, Pramod Immaneni <[email protected]> > > wrote: > > > > > Today we support three different processing modes for operators, "at > > least > > > once", "at most once" and "exactly once" which determine tuple > processing > > > and recovery behavior when there is operator recovery from failure. The > > > default being at least once where the tuples are replayed from the > > > recovered checkpoint. > > > > > > At least once works well for most applications. Typically applications > > > persist the final output of processing through the DAG into various > > outputs > > > like key value stores, databases or even HDFS files. In many of these > > cases > > > various strategies can be employed to save the data "exactly once" in > the > > > output, such as transactions, rewinding, meta data storage, idempotent > > > operations etc. Furthermore the exactly once processing mode, which is > a > > > checkpoint performed every window is rarely used. All this leads to > > > confusion especially to somebody new and also makes it difficult to > > explain > > > these names to less technical audience in meetups and public forums. > > > > > > What I am proposing is only a name change which will make this more > > > intuitive to understand. Something simple like "repeat" for "at least > > > once", "latest" for "at most once" and "repeat latest" for "exactly > once" > > > can do the trick. > > > > > > Thanks > > > > > >
