[ 
https://issues.apache.org/jira/browse/CURATOR-311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian Fang updated CURATOR-311:
------------------------------
    Description: 
We run a Zookeeper 3.5.1-alpha quorum on EC2 instances and the quorum members 
could be changed, for example, one peer could be replaced by a new EC2 instance 
due to EC2 instance termination. We use Apache Curator 3.1.0 as the zookeeper 
client. During our testing, we found the SharedValue data structure could hold 
stall data during and after one peer is replaced and thus led to the system 
failure. 

We look into the SharedValue code. Seems it always returns the value from an 
in-memory reference variable and the value is only updated by a watcher. If for 
any reason, the watch is lost, then the value would never get a chance to be 
updated again.
 
Right now, we added a connection state listener to force SharedValue to call 
readValue(), i.e., read the data from zookeeper directly, if the connection 
state has been changed to RECONNECTED to work around this issue.

It would be great if this issue could be fixed in Curator directly.


  was:
We run a Zookeeper 3.5.1-alpha quorum on EC2 instances and the quorum members 
could be changed, for example, one peer could be replaced by a new EC2 instance 
due to EC2 instance termination. We use Apache Curator 3.1.0 as the zookeeper 
client. During our testing, we found the SharedValue data structure could hold 
stall data during and after one peer is replaced and thus led to the system 
failure. 

We look into the SharedValue code. Seems it always returns the value from an 
in-memory reference variable and the value is only updated by a watcher. If for 
any reason, the watch is lost, then the value would never get a chance to be 
updated again.
 
Right now, I added a connection state listener to force SharedValue to call 
readValue(), i.e., read the data from zookeeper directly, if the connection 
state has been changed to RECONNECTED to work around this issue.

It would be great if this issue could be fixed in Curator directly.



> SharedValue could hold stall data when quourm membership changes
> ----------------------------------------------------------------
>
>                 Key: CURATOR-311
>                 URL: https://issues.apache.org/jira/browse/CURATOR-311
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Recipes
>    Affects Versions: 3.1.0
>         Environment: Linux
>            Reporter: Jian Fang
>
> We run a Zookeeper 3.5.1-alpha quorum on EC2 instances and the quorum members 
> could be changed, for example, one peer could be replaced by a new EC2 
> instance due to EC2 instance termination. We use Apache Curator 3.1.0 as the 
> zookeeper client. During our testing, we found the SharedValue data structure 
> could hold stall data during and after one peer is replaced and thus led to 
> the system failure. 
> We look into the SharedValue code. Seems it always returns the value from an 
> in-memory reference variable and the value is only updated by a watcher. If 
> for any reason, the watch is lost, then the value would never get a chance to 
> be updated again.
>  
> Right now, we added a connection state listener to force SharedValue to call 
> readValue(), i.e., read the data from zookeeper directly, if the connection 
> state has been changed to RECONNECTED to work around this issue.
> It would be great if this issue could be fixed in Curator directly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to