[jira] [Updated] (ZOOKEEPER-3713) ReadOnlyZooKeeperServer should not expose the uninitialized ZKDatabase to client during the snapshot loading.

Pierre Yin (Jira) Mon, 03 Feb 2020 02:55:22 -0800


     [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Pierre Yin updated ZOOKEEPER-3713:
----------------------------------
    Description: 
The Follower/Observer may load snapshot from disk or leader in some scenarios. 
During the snapshot loading, the follower/observer may lose the connection from 
leader when the network is broken.In current design, follower/observer would 
switch into ReadOnly mode immediately when the network connection from leader 
is broken. So follower/observer may become ReadOnlyZooKeeperServer before the 
ZKDatabase initialization of snapshot loading is finished. The time window 
between follower/observer ReadOnly mode's successful switch and the 
ZkDatabase's full snapshot loading is unsafe. 

The unsafe window may confuse Curator's NodeCache. If NodeCache's underlying 
reconnection hit the unsafe window, it may get NoNode KeeperException for the 
specified path and clear the NodeCache. When the unsafe window is elapsed, 
NodeCache can see the data again.

This behavior is not correct. From client's view, it gets a null value for a 
short period 
when the server ensemble network is broken. Curator NodeCache is often used as 
configuration's source. Returning null is confusing and introduces logical 
issues  for configuration scenario.

I think the better behavior should be that reject all the reconnecting during 
the unsafe window. NodeCache still keep the old data when reconnection is 
rejected. This behavior makes sense.

I will send my patch later. Hope someone can help to review it.

  was:
The Follower/Observer may load snapshot from disk or leader in some scenarios. 
During the snapshot loading, the follower/observer may lose the connection from 
leader when the network is broken.In current design, follower/observer would 
switch into ReadOnly mode immediately when the network connection from leader 
is broken. So follower/observer may become ReadOnlyZooKeeperServer before the 
ZKDatabase initialization of snapshot loading is finished. The time window 
between follower/observer ReadOnly mode's successful switch and the 
ZkDatabase's full snapshot loading is unsafe. 

The unsafe window may confuse Curator's NodeCache. If NodeCache's underlying 
reconnection hit the unsafe window, it may get NoNode KeeperException for the 
specified path and clear the NodeCache. When the unsafe window is elapsed, 
NodeCache can see the data again.

This behavior is not correct. From client's view, it gets a null value for a 
short period 
when the server ensemble network is broken. Curator NodeCache is often used as 
configuration's source. Returning null is confusing and introduces logical 
issues  for configuration scenario.

I think the better behavior should be reject all the reconnecting during the 
unsafe window. NodeCache still keep the old data when reconnection is rejected. 
This behavior makes sense.

I will send my patch later. Hope someone can help to review it.


> ReadOnlyZooKeeperServer should not expose the uninitialized ZKDatabase to 
> client during the snapshot loading.
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-3713
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3713
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.6.0, 3.4.14, 3.5.6
>            Reporter: Pierre Yin
>            Priority: Major
>
> The Follower/Observer may load snapshot from disk or leader in some 
> scenarios. During the snapshot loading, the follower/observer may lose the 
> connection from leader when the network is broken.In current design, 
> follower/observer would switch into ReadOnly mode immediately when the 
> network connection from leader is broken. So follower/observer may become 
> ReadOnlyZooKeeperServer before the ZKDatabase initialization of snapshot 
> loading is finished. The time window between follower/observer ReadOnly 
> mode's successful switch and the ZkDatabase's full snapshot loading is 
> unsafe. 
> The unsafe window may confuse Curator's NodeCache. If NodeCache's underlying 
> reconnection hit the unsafe window, it may get NoNode KeeperException for the 
> specified path and clear the NodeCache. When the unsafe window is elapsed, 
> NodeCache can see the data again.
> This behavior is not correct. From client's view, it gets a null value for a 
> short period 
> when the server ensemble network is broken. Curator NodeCache is often used 
> as configuration's source. Returning null is confusing and introduces logical 
> issues  for configuration scenario.
> I think the better behavior should be that reject all the reconnecting during 
> the unsafe window. NodeCache still keep the old data when reconnection is 
> rejected. This behavior makes sense.
> I will send my patch later. Hope someone can help to review it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ZOOKEEPER-3713) ReadOnlyZooKeeperServer should not expose the uninitialized ZKDatabase to client during the snapshot loading.

Reply via email to