[ 
https://issues.apache.org/jira/browse/SAMZA-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jake Maes resolved SAMZA-1561.
------------------------------
    Resolution: Fixed

PR was merged and closed

> JobModel upgrade consistency problem.
> -------------------------------------
>
>                 Key: SAMZA-1561
>                 URL: https://issues.apache.org/jira/browse/SAMZA-1561
>             Project: Samza
>          Issue Type: Bug
>            Reporter: Shanthoosh Venkataraman
>            Assignee: Shanthoosh Venkataraman
>            Priority: Major
>
> JobModel upgrade sequence is the following: 
> A. Read previousJobModelVersion from JobModelBasePath/jobModelVersion.
> B. Publish the new JobModel with version (previousJobModelVersion + 1) to 
> JobModelBasePath/jobmodels.
> C. Create a barrier with version (previousJobModelVersion + 1).
> D. Update jobModelVersion path with value (previousJobModelVersion + 1).
> Followers watch on jobModelVersion path for JobModel upgrades.
> If the leader dies before executing the last step of the upgrade sequence, 
> then any processor elected as leader will be unable to publish the new 
> JobModel and will fail with ZkNodeExistsException (For instance, 
> previousJobModel version is 2 of a processor group [P1, P2]. P1 is the leader 
> and it created zkNode jobModelBasePath/jobModels/3 for publishing jobModel 
> and dies without upgrading jobModelVersion path(which stays as 2). If P2 
> becomes leader, it will generate the jobModel version and try to create node 
> jobModelBasePath/jobModels/3 and will fail).
> This behavior was observed during the testing in one of samza standalone 
> jobs. 
> JobModelBasePath/jobModels is the source of truth for the latest 
> jobModelVersion in a processor group. By maintaining it in a separate 
> zookeeper node and not having the capability to do atomic upgrades we run 
> into this consistency problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to