[ 
https://issues.apache.org/jira/browse/SAMZA-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shanthoosh Venkataraman updated SAMZA-2301:
-------------------------------------------
    Description: 
- We've observed corner cases where-in all the instances of an standalone 
application do not see the same state in zookeeper, i.e, some instance see the 
up-to-date JobModel state and some see an out-dated inconsistent state. There 
is an minuscule propagation delay(depending upon n/w bandwidth) between the 
leader of zookeeper quorum and the other servers in the ensemble. Consider the 
case where an follower undergoes the following execution sequence. 
 - Follower receives an jobModel version change notification from a zookeeper 
server on which the JobModel version updation had been made by standalone 
leader processor. 
 - Follower tries to read JobModel, but receives a session disconnect from the 
up-to date zookeeper server. IOTec ZkClient library retries connecting to other 
servers in the ensemble and connection to an out-dated zookeeper server is 
established successfully. The read for JobModel from this out-dated zookeeper 
server would return null for the new JobModel zookeeper path and there by 
killing the standalone processor.

> Improve zookeeper metadata-store implementation
> -----------------------------------------------
>
>                 Key: SAMZA-2301
>                 URL: https://issues.apache.org/jira/browse/SAMZA-2301
>             Project: Samza
>          Issue Type: New Feature
>            Reporter: Shanthoosh Venkataraman
>            Assignee: Shanthoosh Venkataraman
>            Priority: Major
>
> - We've observed corner cases where-in all the instances of an standalone 
> application do not see the same state in zookeeper, i.e, some instance see 
> the up-to-date JobModel state and some see an out-dated inconsistent state. 
> There is an minuscule propagation delay(depending upon n/w bandwidth) between 
> the leader of zookeeper quorum and the other servers in the ensemble. 
> Consider the case where an follower undergoes the following execution 
> sequence. 
>  - Follower receives an jobModel version change notification from a zookeeper 
> server on which the JobModel version updation had been made by standalone 
> leader processor. 
>  - Follower tries to read JobModel, but receives a session disconnect from 
> the up-to date zookeeper server. IOTec ZkClient library retries connecting to 
> other servers in the ensemble and connection to an out-dated zookeeper server 
> is established successfully. The read for JobModel from this out-dated 
> zookeeper server would return null for the new JobModel zookeeper path and 
> there by killing the standalone processor.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to