Zhuqi Jin created ZOOKEEPER-3848: ------------------------------------ Summary: Zookeeper upgrade fails due to missing snapshots on branch-3.6 Key: ZOOKEEPER-3848 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3848 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.6.2 Reporter: Zhuqi Jin
We tested upgrading a single-node zookeeper from branch-3.4/branch-3.5 to branch-3.6, but the upgraded node failed to start. The error message is shown as following: {code:java} 2020-05-24 00:24:24,996 [myid:1] - ERROR [main:ZooKeeperServerMain@90] - Unexpected exception, exiting abnormally java.io.IOException: No snapshot found, but there are log entries. Something is broken! at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:281) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:285) at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:484) at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:655) at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:758) at org.apache.zookeeper.server.ServerCnxnFactory.startup(ServerCnxnFactory.java:130) at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:159) at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:112) at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:67) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:140) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:90) 2020-05-24 00:24:24,999 [myid:1] - INFO [main:ZKAuditProvider@42] - ZooKeeper audit is disabled. 2020-05-24 00:24:25,001 [myid:1] - ERROR [main:ServiceUtils@42] - Exiting JVM with code 1 {code} The error can be reproduced through the following steps: # Step1: Start a single-node zookeeper (compiled from either branch-3.4 or branch-3.5) with the following configuration(zoo.cfg): {code:java} tickTime=2000 initLimit=10 syncLimit=5 dataDir=/tmp/zookeeper clientPort=2181 server.1=localhost:2888:3888{code} # Step2: Use a zookeeper stress testing tool - zk-smoketool ([https://github.com/phunt/zk-smoketest.git]) - to test this node. We invoked create, set, and get operations in zk-smoketool but not delete operation, so that generated data are left on disk. # Step3: Upgrade the node to branch-3.6 with the same configuration. After upgraded, as the log suggested, zookeeper failed to start. We learned about ZOOKEEPER-3056 and ZOOKEEPER-3513, and added {code:java} zookeeper.snapshot.trust.empty=true {code} to branch-3.6's configuration(zoo.cfg), but it ran into the same failure. -- This message was sent by Atlassian Jira (v8.3.4#803005)