[
https://issues.apache.org/jira/browse/SAMZA-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shanthoosh Venkataraman updated SAMZA-2301:
-------------------------------------------
Summary: Add non-null checks in JobModel read control-flow in standalone.
(was: Improve zookeeper metadata-store implementation)
> Add non-null checks in JobModel read control-flow in standalone.
> ----------------------------------------------------------------
>
> Key: SAMZA-2301
> URL: https://issues.apache.org/jira/browse/SAMZA-2301
> Project: Samza
> Issue Type: New Feature
> Reporter: Shanthoosh Venkataraman
> Assignee: Shanthoosh Venkataraman
> Priority: Major
> Time Spent: 1h 10m
> Remaining Estimate: 0h
>
> - We've observed corner cases where-in all the instances of an standalone
> application do not see the same state in zookeeper, i.e, some instance see
> the up-to-date JobModel state and some see an out-dated inconsistent state.
> There is an minuscule propagation delay(depending upon n/w bandwidth) between
> the leader of zookeeper quorum and the other servers in the ensemble.
> Consider the case where an follower undergoes the following execution
> sequence.
> - Follower receives an jobModel version change notification from a zookeeper
> server on which the JobModel version updation had been made by standalone
> leader processor.
> - Follower tries to read JobModel, but receives a session disconnect from
> the up-to date zookeeper server. IOTec ZkClient library retries connecting to
> other servers in the ensemble and connection to an out-dated zookeeper server
> is established successfully. The read for JobModel from this out-dated
> zookeeper server would return null for the new JobModel zookeeper path and
> there by killing the standalone processor.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)