shanthoosh opened a new pull request #1137: Improve zookeper metadata implementation URL: https://github.com/apache/samza/pull/1137 - We've observed corner cases where-in all the instances of an standalone application do not see the same state in zookeeper, i.e, some instance see the up-to-date JobModel state and some see an out-dated inconsistent state. There is an minuscule propagation delay(depending upon n/w bandwidth) between the leader of zookeeper quorum and the other servers in the ensemble. Consider the case where an follower undergoes the following execution sequence. - Follower receives an jobModel version change notification from a zookeeper server on which the JobModel version updation had been made by standalone leader processor. - Follower tries to read JobModel, but receives a session disconnect from the up-to date zookeeper server. IOTec ZkClient library retries connecting to other servers in the ensemble and connection to an out-dated zookeeper server is established successfully. The read for JobModel from this out-dated zookeeper server would return null for the new JobModel zookeeper path and there by killing the standalone processor. This patch solves the problem,. by adding fixed retries in the metadata-store read path.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
