I agree with Vlad: these names are so deeply embedded in the community that
changing them is likely
to create more problems than it solves.

Ram

On Tue, Feb 2, 2016 at 3:29 PM, Vlad Rozov <[email protected]> wrote:

> I vote to keep original names and educate/explain their meaning to non
> technical audience as delivery guarantee is not specific to Apex, but has
> common meaning for all streaming platforms.
>
> Vlad
>
>
> On 2/2/16 15:17, Timothy Farkas wrote:
>
>> Could we provide Processing and Output Centric Aliases for the
>> ProcessingModes?
>>
>> ProcessingMode.AT_MOST_ONCE_OUTPUT = ProcessingMode.AT_MOST_ONCE
>> ProcessingMode.EXACTLY_ONCE_OUTPUT = ProcessingMode.AT_LEAST_ONCE
>>
>> ProcessingMode.AT_MOST_ONCE_PROCESSING = ProcessingMode.AT_MOST_ONCE
>> ProcessingMode.AT_LEAST_ONCE_PROCESSING = ProcessingMode.AT_LEAST_ONCE
>> ProcessingMode.EXACTLY_ONCE_PROCESSING = ProcessingMode.EXACTLY_ONCE
>>
>> Tim
>>
>> On Tue, Feb 2, 2016 at 3:00 PM, Pramod Immaneni <[email protected]>
>> wrote:
>>
>> Well output guarantees are managed by the operators themselves so the user
>>> will typically not see that as part of the engine features, they only see
>>> processing guarantees and while they are technically correct as far as
>>> individual operators are concerned the names give a different idea.
>>>
>>> Thanks
>>>
>>> On Tue, Feb 2, 2016 at 2:53 PM, Timothy Farkas <[email protected]>
>>> wrote:
>>>
>>> I think I understand the ambiguity you are trying to clear up Pramod.
>>>> Perhaps it can be disambiguated by distinguishing between Processing
>>>> Guarantees and Output Guarantees, when explaining to people. Processing
>>>> Guarantees apply to the way tuples are transmitted between operators.
>>>> Output Guarantees apply to the way output operators write tuples to a
>>>>
>>> Data
>>>
>>>> Sink.
>>>>
>>>> This way we can describe each term intuitively in each context:
>>>>
>>>> At Most Once: A tuple can be dropped or transmitted (written) only once.
>>>> At Least Once: A tuple can be transmitted (written) one or more times.
>>>> Exactly Once: A tuple is transmitted (written) only once.
>>>>
>>>> Then we could provide a table with the strongest Output Guarantee that
>>>> is
>>>> possible for each Processing Guarantee.
>>>>
>>>> Processing          |   Strongest Output Guarantee
>>>> ----------------------------------------------
>>>> At Most Once      | At Most Once
>>>> At Least Once     | Exactly Once
>>>> Exactly Once      |  Exactly Once
>>>>
>>>> Thoughts?
>>>>
>>>> Thanks,
>>>> Tim
>>>>
>>>> On Tue, Feb 2, 2016 at 2:25 PM, Sandesh Hegde <[email protected]>
>>>> wrote:
>>>>
>>>> I agree with Tim. Instead of new terminologies, better explanation for
>>>>>
>>>> the
>>>>
>>>>> existing once are more useful.
>>>>>
>>>>> On Tue, Feb 2, 2016 at 2:23 PM Pramod Immaneni <[email protected]
>>>>> wrote:
>>>>>
>>>>> The idea is to disambiguate without using at least once since exactly
>>>>>>
>>>>> once
>>>>>
>>>>>> output can still be achieved with those. Any other names are fine,
>>>>>>
>>>>> those
>>>>
>>>>> were just suggestions.
>>>>>>
>>>>>> On Tue, Feb 2, 2016 at 2:10 PM, Timothy Farkas <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>> The new names don't make as much sense to me as the original names.
>>>>>>>
>>>>>> The
>>>>
>>>>> concepts require some thought to understand, and it won't
>>>>>>>
>>>>>> necessarily
>>>
>>>> be
>>>>>
>>>>>> made easier with a name change. I think a better way to attack
>>>>>>> misunderstandings is to clearly explain what a window, operator,
>>>>>>>
>>>>>> input
>>>>
>>>>> operator, output operator, tuple, checkpoint, and DAG is with
>>>>>>>
>>>>>> really
>>>
>>>> clean
>>>>>>
>>>>>>> and simple illustrations of the concepts. Then we can explain more
>>>>>>>
>>>>>> involved
>>>>>>
>>>>>>> concepts like At Least Once, At Most Once, and Exactly Once with
>>>>>>>
>>>>>> well
>>>
>>>> thought illustrations. Without a clear explanation of the basic
>>>>>>>
>>>>>> vocabulary,
>>>>>>
>>>>>>> and without pictures, it is difficult to get even technical people
>>>>>>>
>>>>>> to
>>>
>>>> understand these concepts.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Tim
>>>>>>>
>>>>>>> On Tue, Feb 2, 2016 at 9:13 AM, Pramod Immaneni <
>>>>>>>
>>>>>> [email protected]>
>>>>>
>>>>>> wrote:
>>>>>>>
>>>>>>> Today we support three different processing modes for operators,
>>>>>>>>
>>>>>>> "at
>>>>
>>>>> least
>>>>>>>
>>>>>>>> once", "at most once" and "exactly once" which determine tuple
>>>>>>>>
>>>>>>> processing
>>>>>>
>>>>>>> and recovery behavior when there is operator recovery from
>>>>>>>>
>>>>>>> failure.
>>>
>>>> The
>>>>>
>>>>>> default being at least once where the tuples are replayed from
>>>>>>>>
>>>>>>> the
>>>
>>>> recovered checkpoint.
>>>>>>>>
>>>>>>>> At least once works well for most applications. Typically
>>>>>>>>
>>>>>>> applications
>>>>>
>>>>>> persist the final output of processing through the DAG into
>>>>>>>>
>>>>>>> various
>>>
>>>> outputs
>>>>>>>
>>>>>>>> like key value stores, databases or even HDFS files. In many of
>>>>>>>>
>>>>>>> these
>>>>
>>>>> cases
>>>>>>>
>>>>>>>> various strategies can be employed to save the data "exactly
>>>>>>>>
>>>>>>> once"
>>>
>>>> in
>>>>
>>>>> the
>>>>>>
>>>>>>> output, such as transactions, rewinding, meta data storage,
>>>>>>>>
>>>>>>> idempotent
>>>>>
>>>>>> operations etc. Furthermore the exactly once processing mode,
>>>>>>>>
>>>>>>> which
>>>
>>>> is
>>>>>
>>>>>> a
>>>>>>
>>>>>>> checkpoint performed every window is rarely used. All this leads
>>>>>>>>
>>>>>>> to
>>>
>>>> confusion especially to somebody new and also makes it difficult
>>>>>>>>
>>>>>>> to
>>>
>>>> explain
>>>>>>>
>>>>>>>> these names to less technical audience in meetups and public
>>>>>>>>
>>>>>>> forums.
>>>>
>>>>> What I am proposing is only a name change which will make this
>>>>>>>>
>>>>>>> more
>>>
>>>> intuitive to understand. Something simple like "repeat" for "at
>>>>>>>>
>>>>>>> least
>>>>
>>>>> once", "latest" for "at most once" and "repeat latest" for
>>>>>>>>
>>>>>>> "exactly
>>>
>>>> once"
>>>>>>
>>>>>>> can do the trick.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>>
>

Reply via email to