[
https://issues.apache.org/jira/browse/SAMZA-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jake Maes resolved SAMZA-1561.
------------------------------
Resolution: Fixed
PR was merged and closed
> JobModel upgrade consistency problem.
> -------------------------------------
>
> Key: SAMZA-1561
> URL: https://issues.apache.org/jira/browse/SAMZA-1561
> Project: Samza
> Issue Type: Bug
> Reporter: Shanthoosh Venkataraman
> Assignee: Shanthoosh Venkataraman
> Priority: Major
>
> JobModel upgrade sequence is the following:
> A. Read previousJobModelVersion from JobModelBasePath/jobModelVersion.
> B. Publish the new JobModel with version (previousJobModelVersion + 1) to
> JobModelBasePath/jobmodels.
> C. Create a barrier with version (previousJobModelVersion + 1).
> D. Update jobModelVersion path with value (previousJobModelVersion + 1).
> Followers watch on jobModelVersion path for JobModel upgrades.
> If the leader dies before executing the last step of the upgrade sequence,
> then any processor elected as leader will be unable to publish the new
> JobModel and will fail with ZkNodeExistsException (For instance,
> previousJobModel version is 2 of a processor group [P1, P2]. P1 is the leader
> and it created zkNode jobModelBasePath/jobModels/3 for publishing jobModel
> and dies without upgrading jobModelVersion path(which stays as 2). If P2
> becomes leader, it will generate the jobModel version and try to create node
> jobModelBasePath/jobModels/3 and will fail).
> This behavior was observed during the testing in one of samza standalone
> jobs.
> JobModelBasePath/jobModels is the source of truth for the latest
> jobModelVersion in a processor group. By maintaining it in a separate
> zookeeper node and not having the capability to do atomic upgrades we run
> into this consistency problem.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)