Pierre Yin created ZOOKEEPER-3713:
-------------------------------------
Summary: ReadOnlyZooKeeperServer should not expose the
uninitialized ZKDatabase to client during the snapshot loading.
Key: ZOOKEEPER-3713
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3713
Project: ZooKeeper
Issue Type: Bug
Components: server
Affects Versions: 3.5.6, 3.4.14, 3.6.0
Reporter: Pierre Yin
The Follower/Observer may load snapshot from disk or leader in some scenarios.
During the snapshot loading, the follower/observer may lose the connection from
leader when the network is broken.In current design, follower/observer would
switch into ReadOnly mode immediately when the network connection from leader
is broken. So follower/observer may become ReadOnlyZooKeeperServer before the
ZKDatabase initialization of snapshot loading is finished. The time window
between follower/observer ReadOnly mode's successful switch and the
ZkDatabase's fully snapshot loading is unsafe.
The unsafe window may confuse Curator's NodeCache. If NodeCache's underlying
reconnection hit the unsafe window, it may get NoNode KeeperException for the
specified path and clear the NodeCache. When the unsafe window is elapsed,
NodeCache
can see the data again.
This behavior is not correct. From client's view, it gets a null value for a
short period
when the server ensemble network is broken. Curator NodeCache is often used as
configuration's source. Returning null is confusing and introduces logical
issues for configuration scenario.
I think the better behavior should be reject all the reconnecting during the
unsafe window. NodeCache still keep the old data when reconnection is rejected.
This behavior makes sense.
I will send my patch later. Hope someone can help to review it.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)