Pierre Yin created ZOOKEEPER-3713:
-------------------------------------

             Summary: ReadOnlyZooKeeperServer should not expose the 
uninitialized ZKDatabase to client during the snapshot loading.
                 Key: ZOOKEEPER-3713
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3713
             Project: ZooKeeper
          Issue Type: Bug
          Components: server
    Affects Versions: 3.5.6, 3.4.14, 3.6.0
            Reporter: Pierre Yin


The Follower/Observer may load snapshot from disk or leader in some scenarios. 
During the snapshot loading, the follower/observer may lose the connection from 
leader when the network is broken.In current design, follower/observer would 
switch into ReadOnly mode immediately when the network connection from leader 
is broken. So follower/observer may become ReadOnlyZooKeeperServer before the 
ZKDatabase initialization of snapshot loading is finished. The time window 
between follower/observer ReadOnly mode's successful switch and the 
ZkDatabase's fully snapshot loading is unsafe. 

The unsafe window may confuse Curator's NodeCache. If NodeCache's underlying 
reconnection hit the unsafe window, it may get NoNode KeeperException for the 
specified path and clear the NodeCache. When the unsafe window is elapsed, 
NodeCache
can see the data again.

This behavior is not correct. From client's view, it gets a null value for a 
short period 
when the server ensemble network is broken. Curator NodeCache is often used as 
configuration's source. Returning null is confusing and introduces logical 
issues  for configuration scenario.

I think the better behavior should be reject all the reconnecting during the 
unsafe window. NodeCache still keep the old data when reconnection is rejected. 
This behavior makes sense.

I will send my patch later. Hope someone can help to review it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to