I am in the process of writing a technical blog on this topic.

Thanks

> On Feb 3, 2016, at 8:14 AM, Amol Kekre <[email protected]> wrote:
>
> Agreed on sticking to standard terminology and explaining details. A deep
> technical blog plus a section on this topic in Apex doc would work.
>
> Thks,
> Amol
>
>
> On Tue, Feb 2, 2016 at 10:51 PM, Thomas Weise <[email protected]>
> wrote:
>
>> We should stick with standard terminology but make sure the differences are
>> well explained. That's necessary because other platforms use the same words
>> with different meaning, compare Storm, Spark Streaming and Flink.
>>
>> Take "exactly once" as example. Elsewhere you will find it claimed when it
>> really is "at least once". Events are replayed and computation repeated.
>> When all operations in the overall system are idempotent, then it is
>> possible to avoid effects such as double counting, duplicate web service
>> calls or rows in the database etc. Hence, the engine cannot claim to
>> support "exactly once", this is only valid when operators used in the
>> application collectively support it.
>>
>> In Apex, the engine provides the hooks (endWindow, committed) to achieve
>> idempotency in operators that have an effect on external systems. There are
>> several implementations of operators that can be used with at-least-once
>> processing mode that will deliver "exactly-once" for the application when
>> all operations in the DAG are idempotent.
>>
>>
>>
>>
>>
>> On Tue, Feb 2, 2016 at 10:26 PM, Shubham Pathak <[email protected]>
>> wrote:
>>
>>> +1 for adding detailed explanation about the concepts in tutorials.
>>>
>>>
>>> On Wed, Feb 3, 2016 at 11:30 AM, Chinmay Kolhatkar <
>>> [email protected]>
>>> wrote:
>>>
>>>> +1 for Vlad's suggestion. Searching for keywords like "at least once",
>>> "at
>>>> most once" and "exactly once" tells that these terminologies are are
>>> widely
>>>> popular where semantics are defined for tuple processing.
>>>> Adding example applications for each of them would help in educating
>> the
>>>> terminologies in Apex context.
>>>>
>>>> On Wed, Feb 3, 2016 at 8:52 AM, Chanchal Singh <
>>> [email protected]
>>>> wrote:
>>>>
>>>>> I do agree with Vlad. it will be good to have good explanation with
>>>> example
>>>>> for existing names as it will be not create confusion for those who
>>>> already
>>>>> knows it and also for those who are beginners.
>>>>>
>>>>> On Wed, Feb 3, 2016 at 8:38 AM, Amol Kekre <[email protected]>
>>> wrote:
>>>>>
>>>>>> I agree with Vlad too.
>>>>>>
>>>>>> Thks
>>>>>> Amol
>>>>>>
>>>>>>
>>>>>> On Tue, Feb 2, 2016 at 3:33 PM, Munagala Ramanath <
>>> [email protected]
>>>>>
>>>>>> wrote:
>>>>>>
>>>>>>> I agree with Vlad: these names are so deeply embedded in the
>>>> community
>>>>>> that
>>>>>>> changing them is likely
>>>>>>> to create more problems than it solves.
>>>>>>>
>>>>>>> Ram
>>>>>>>
>>>>>>> On Tue, Feb 2, 2016 at 3:29 PM, Vlad Rozov <
>>> [email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I vote to keep original names and educate/explain their meaning
>>> to
>>>>> non
>>>>>>>> technical audience as delivery guarantee is not specific to
>> Apex,
>>>> but
>>>>>> has
>>>>>>>> common meaning for all streaming platforms.
>>>>>>>>
>>>>>>>> Vlad
>>>>>>>>
>>>>>>>>
>>>>>>>>> On 2/2/16 15:17, Timothy Farkas wrote:
>>>>>>>>>
>>>>>>>>> Could we provide Processing and Output Centric Aliases for the
>>>>>>>>> ProcessingModes?
>>>>>>>>>
>>>>>>>>> ProcessingMode.AT_MOST_ONCE_OUTPUT =
>> ProcessingMode.AT_MOST_ONCE
>>>>>>>>> ProcessingMode.EXACTLY_ONCE_OUTPUT =
>>> ProcessingMode.AT_LEAST_ONCE
>>>>>>>>>
>>>>>>>>> ProcessingMode.AT_MOST_ONCE_PROCESSING =
>>>> ProcessingMode.AT_MOST_ONCE
>>>>>>>>> ProcessingMode.AT_LEAST_ONCE_PROCESSING =
>>>>> ProcessingMode.AT_LEAST_ONCE
>>>>>>>>> ProcessingMode.EXACTLY_ONCE_PROCESSING =
>>>> ProcessingMode.EXACTLY_ONCE
>>>>>>>>>
>>>>>>>>> Tim
>>>>>>>>>
>>>>>>>>> On Tue, Feb 2, 2016 at 3:00 PM, Pramod Immaneni <
>>>>>> [email protected]
>>>>>>>>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Well output guarantees are managed by the operators themselves
>>> so
>>>>> the
>>>>>>> user
>>>>>>>>>> will typically not see that as part of the engine features,
>>> they
>>>>> only
>>>>>>> see
>>>>>>>>>> processing guarantees and while they are technically correct
>> as
>>>> far
>>>>>> as
>>>>>>>>>> individual operators are concerned the names give a different
>>>> idea.
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>>
>>>>>>>>>> On Tue, Feb 2, 2016 at 2:53 PM, Timothy Farkas <
>>>>> [email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> I think I understand the ambiguity you are trying to clear up
>>>>> Pramod.
>>>>>>>>>>> Perhaps it can be disambiguated by distinguishing between
>>>>> Processing
>>>>>>>>>>> Guarantees and Output Guarantees, when explaining to people.
>>>>>>> Processing
>>>>>>>>>>> Guarantees apply to the way tuples are transmitted between
>>>>>> operators.
>>>>>>>>>>> Output Guarantees apply to the way output operators write
>>> tuples
>>>>> to
>>>>>> a
>>>>>>>>>> Data
>>>>>>>>>>
>>>>>>>>>>> Sink.
>>>>>>>>>>>
>>>>>>>>>>> This way we can describe each term intuitively in each
>>> context:
>>>>>>>>>>>
>>>>>>>>>>> At Most Once: A tuple can be dropped or transmitted
>> (written)
>>>> only
>>>>>>> once.
>>>>>>>>>>> At Least Once: A tuple can be transmitted (written) one or
>>> more
>>>>>> times.
>>>>>>>>>>> Exactly Once: A tuple is transmitted (written) only once.
>>>>>>>>>>>
>>>>>>>>>>> Then we could provide a table with the strongest Output
>>>> Guarantee
>>>>>> that
>>>>>>>>>>> is
>>>>>>>>>>> possible for each Processing Guarantee.
>>>>>>>>>>>
>>>>>>>>>>> Processing          |   Strongest Output Guarantee
>>>>>>>>>>> ----------------------------------------------
>>>>>>>>>>> At Most Once      | At Most Once
>>>>>>>>>>> At Least Once     | Exactly Once
>>>>>>>>>>> Exactly Once      |  Exactly Once
>>>>>>>>>>>
>>>>>>>>>>> Thoughts?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Tim
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Feb 2, 2016 at 2:25 PM, Sandesh Hegde <
>>>>>>> [email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> I agree with Tim. Instead of new terminologies, better
>>>> explanation
>>>>>> for
>>>>>>>>>>> the
>>>>>>>>>>>
>>>>>>>>>>>> existing once are more useful.
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Feb 2, 2016 at 2:23 PM Pramod Immaneni <
>>>>>>> [email protected]
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> The idea is to disambiguate without using at least once
>> since
>>>>>> exactly
>>>>>>>>>>>> once
>>>>>>>>>>>>
>>>>>>>>>>>>> output can still be achieved with those. Any other names
>> are
>>>>> fine,
>>>>>>>>>>>> those
>>>>>>>>>>>
>>>>>>>>>>>> were just suggestions.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Feb 2, 2016 at 2:10 PM, Timothy Farkas <
>>>>>> [email protected]
>>>>>>>>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> The new names don't make as much sense to me as the
>> original
>>>>>> names.
>>>>>>>>>>>>> The
>>>>>>>>>>>
>>>>>>>>>>>> concepts require some thought to understand, and it won't
>>>>>>>>>>>>> necessarily
>>>>>>>>>>
>>>>>>>>>>> be
>>>>>>>>>>>>
>>>>>>>>>>>>> made easier with a name change. I think a better way to
>>> attack
>>>>>>>>>>>>>> misunderstandings is to clearly explain what a window,
>>>>> operator,
>>>>>>>>>>>>> input
>>>>>>>>>>>
>>>>>>>>>>>> operator, output operator, tuple, checkpoint, and DAG is
>> with
>>>>>>>>>>>>> really
>>>>>>>>>>
>>>>>>>>>>> clean
>>>>>>>>>>>>>
>>>>>>>>>>>>>> and simple illustrations of the concepts. Then we can
>>> explain
>>>>>> more
>>>>>>>>>>>>> involved
>>>>>>>>>>>>>
>>>>>>>>>>>>>> concepts like At Least Once, At Most Once, and Exactly
>> Once
>>>>> with
>>>>>>>>>>>>> well
>>>>>>>>>>
>>>>>>>>>>> thought illustrations. Without a clear explanation of the
>>> basic
>>>>>>>>>>>>> vocabulary,
>>>>>>>>>>>>>
>>>>>>>>>>>>>> and without pictures, it is difficult to get even
>> technical
>>>>>> people
>>>>>>>>>>>>> to
>>>>>>>>>>
>>>>>>>>>>> understand these concepts.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Tim
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Feb 2, 2016 at 9:13 AM, Pramod Immaneni <
>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Today we support three different processing modes for
>>>>> operators,
>>>>>>>>>>>>>> "at
>>>>>>>>>>>
>>>>>>>>>>>> least
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> once", "at most once" and "exactly once" which determine
>>>> tuple
>>>>>>>>>>>>>> processing
>>>>>>>>>>>>>
>>>>>>>>>>>>>> and recovery behavior when there is operator recovery
>> from
>>>>>>>>>>>>>> failure.
>>>>>>>>>>
>>>>>>>>>>> The
>>>>>>>>>>>>
>>>>>>>>>>>>> default being at least once where the tuples are replayed
>>> from
>>>>>>>>>>>>>> the
>>>>>>>>>>
>>>>>>>>>>> recovered checkpoint.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> At least once works well for most applications.
>> Typically
>>>>>>>>>>>>>> applications
>>>>>>>>>>>>
>>>>>>>>>>>>> persist the final output of processing through the DAG
>> into
>>>>>>>>>>>>>> various
>>>>>>>>>>
>>>>>>>>>>> outputs
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> like key value stores, databases or even HDFS files. In
>>> many
>>>>> of
>>>>>>>>>>>>>> these
>>>>>>>>>>>
>>>>>>>>>>>> cases
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> various strategies can be employed to save the data
>>> "exactly
>>>>>>>>>>>>>> once"
>>>>>>>>>>
>>>>>>>>>>> in
>>>>>>>>>>>
>>>>>>>>>>>> the
>>>>>>>>>>>>>
>>>>>>>>>>>>>> output, such as transactions, rewinding, meta data
>> storage,
>>>>>>>>>>>>>> idempotent
>>>>>>>>>>>>
>>>>>>>>>>>>> operations etc. Furthermore the exactly once processing
>>> mode,
>>>>>>>>>>>>>> which
>>>>>>>>>>
>>>>>>>>>>> is
>>>>>>>>>>>>
>>>>>>>>>>>>> a
>>>>>>>>>>>>>
>>>>>>>>>>>>>> checkpoint performed every window is rarely used. All
>> this
>>>>> leads
>>>>>>>>>>>>>> to
>>>>>>>>>>
>>>>>>>>>>> confusion especially to somebody new and also makes it
>>> difficult
>>>>>>>>>>>>>> to
>>>>>>>>>>
>>>>>>>>>>> explain
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> these names to less technical audience in meetups and
>>> public
>>>>>>>>>>>>>> forums.
>>>>>>>>>>>
>>>>>>>>>>>> What I am proposing is only a name change which will make
>>> this
>>>>>>>>>>>>>> more
>>>>>>>>>>
>>>>>>>>>>> intuitive to understand. Something simple like "repeat" for
>>> "at
>>>>>>>>>>>>>> least
>>>>>>>>>>>
>>>>>>>>>>>> once", "latest" for "at most once" and "repeat latest" for
>>>>>>>>>>>>>> "exactly
>>>>>>>>>>
>>>>>>>>>>> once"
>>>>>>>>>>>>>
>>>>>>>>>>>>>> can do the trick.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks
>>

Reply via email to