[ 
https://issues.apache.org/jira/browse/SAMZA-1454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16200850#comment-16200850
 ] 

ASF GitHub Bot commented on SAMZA-1454:
---------------------------------------

GitHub user prateekm opened a pull request:

    https://github.com/apache/samza/pull/324

    SAMZA-1454: Globally unique and user settable IDs for stateful operators

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/prateekm/samza operator-id-uniqueness

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/samza/pull/324.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #324
    
----
commit 53a872ce99761a806ab26878366feca938810702
Author: Prateek Maheshwari <[email protected]>
Date:   2017-10-11T19:30:28Z

    SAMZA-1454: Globally unique and user settable IDs for stateful operators

----


> Globally unique and user settable IDs for stateful operators
> ------------------------------------------------------------
>
>                 Key: SAMZA-1454
>                 URL: https://issues.apache.org/jira/browse/SAMZA-1454
>             Project: Samza
>          Issue Type: Bug
>            Reporter: Prateek Maheshwari
>            Assignee: Prateek Maheshwari
>             Fix For: 0.14.0
>
>
> Operators in the the High Level API need globally unique IDs so that we can:
> 1. Create intermediate topics (e.g. for partitionBy).
> 2. Create operator state stores and their changelog topics.
> 3. Start processing from particular points in the DAG (multi-stage).
> Ideally, a particular operator's ID should be:
> 1. The same across job restarts with no code changes.
> 2. The same despite functionality preserving changes in topology (e.g. 
> splitting a map into two maps).
> 3. Different if the semantics of an operator change without a change in 
> topology (e.g. a change in the map function logic).
> 1 is satisfied by a topology based ID, but 2 & 3 are difficult for the 
> framework to determine automatically. All three are required to ensure 
> correctness of durable intermediate state in the DAG across restarts with 
> code changes. In other words, users need to be able handle the following 
> scenarios:
> 1. If they make a code change that preserves the topology but changes the 
> semantics, they should be able to indicate that the downstream stateful 
> operators are now semantically different than before and the previous state 
> should be invalidated.
> 2. If they make a code change that changes the topology but preserves the 
> semantics, they should be able to indicate that the  downstream stateful 
> operators are still the same as earlier and the previous state should be 
> reused.
> To support these two scenarios, we will allow (and recommend) users to 
> provide a custom ID for stateful operators (partitionBy, window and join 
> operators) that's unique in the DAG. We will use this ID as part of the name 
> for any durable state (streams, stores, changelogs) associated with the 
> operator. Whenever users make a code change that changes the topology or 
> semantics of the application, they should identify any affected operators 
> downstream of the change and change or preserve their operator names 
> accordingly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to