[
https://issues.apache.org/jira/browse/ZOOKEEPER-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Akihiro Suda updated ZOOKEEPER-2162:
------------------------------------
Attachment: ZOOKEEPER-2162.patch
A naive patch for ZOOKEEPER-2162.
This shutdowns server when leader's epoch < accepted epoch.
> infinite exception loop occurs when dataDir is lost
> ---------------------------------------------------
>
> Key: ZOOKEEPER-2162
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2162
> Project: ZooKeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.5.0
> Reporter: Akihiro Suda
> Attachments: ZOOKEEPER-2162.patch
>
>
> This sequence leads server.1 and server.2 to infinite exception loop.
> * Start server.1 and server.2 with the initial ensemble
> server.1=participant, server.2=observer.
> In this time, acceptedEpoch\[i\] == currentEpoch\[i\] == 1 for i = 1, 2.
> * Invoke reconfig so that acceptedEpoch\[i\] and currentEpoch\[i\] grows up
> to 2.
> * Kill server.2
> * Remove dataDir of server.2 excluding the myid file.
> (In real production environments, both of confDir and dataDir can be lost
> due to reprovisioning)
> * Start server.2
> * server.1 and server.2 enters infinite exception loop.
> The log (threshold is set to INFO in log4j.properties) size can reach >
> 100MB in 30 seconds.
> AFAIK, the bug can be reproduced with
> ZooKeeper@f5fb50ed2591ba9a24685a227bb5374759516828 (Apr 7, 2015).
> I made a Docker container so that people who are interested can reproduce the
> bug easily. (Sorry for no JUnit tests right now)
> {noformat}
> $ docker run -i -t --rm akihirosuda/zookeeper-bug01
> Reproducing the bug: infinite exception loop occurs when dataDir is lost
> * Resetting
> * Starting [1,2] with the initial ensemble [1]
> * Sleeping for 3 seconds
> * Invoking Reconfig [1]->[2]
> * Sleeping for 3 seconds
> * Killing server.2 (pid=10542)
> * Sleeping for 3 seconds
> * Resetting /zk02_data
> * Starting server.2
> * Sleeping for 30 seconds
> /zk01_log: 81665114 bytes
> The log dir is extremely large. Perhaps the bug was REPRODUCED!
> /zk02_log: 23949367 bytes
> The log dir is extremely large. Perhaps the bug was REPRODUCED!
> * Exiting
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)