Jian Fang created CURATOR-311:
---------------------------------
Summary: SharedValue could hold stall data when quourm membership
changes
Key: CURATOR-311
URL: https://issues.apache.org/jira/browse/CURATOR-311
Project: Apache Curator
Issue Type: Bug
Components: Recipes
Affects Versions: 3.1.0
Environment: Linux
Reporter: Jian Fang
We run a Zookeeper 3.5.1-alpha quorum on EC2 instances and the quorum members
could be changed, for example, one peer could be replaced by a new EC2 instance
due to EC2 instance termination. We use Apache Curator 3.1.0 as the zookeeper
client. During our testing, we found the SharedValue data structure could hold
stall data during and after one peer is replaced and thus led the system
failure.
I look at the SharedValue code and seems it always returns the value from an
in-memory reference variable and the value is only updated by a watcher. If for
any reason, the watch is lost, then the value would never get a chance to be
updated again.
Right now, I added a connection state listener to force SharedValue to call
readValue(), i.e., read the data from zookeeper directly, if the connection
state has been changed to RECONNECTED to work around this issue.
It would be great if this issue could be fixed in Curator directly.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)