Hi Sijie, >> I am just curious why the change was made in such way.
It's a safety guarantee. Consider this case: * An ensemble of server A, B, and C. A and B have most up to date transactions (let's say zxid + 1) while C is lagging one transaction behind (C has zxid). A is the current leader. * A is partitioned away. And for some reasons B lost its snapshot file (for example an admin 'rm -rf' the entire dataDir by mistake) at the same time. * Now with B and C, if we don't do the check, B will be elected as leader as it has most up to date transaction (zxid + 1). The state of the ensemble will be set as B's state, which is incorrect as although it has most up to date transactions, it lost the old state with the missing snapshot file. * In this case, we'd rather have the system stops working, by disallowing B participate leader election, rather than having a working system with incorrect state. Note the only case that we allow an empty snapshot file is when B is bootstrapped as a new server joining the quorum. >> Also can you advice the steps for people who using 3.4.x to upgrade to 3.5.4-beta The only catch I remember is that if you are using a version older than 3.4.6, you'd need to upgrade through 3.4.6 first before upgrading to 3.5.x, if you are doing a rolling upgrade and want to keep the liveness of the quorum. See more https://zookeeper.apache.org/doc/r3.5.3-beta/zookeeperReconfig.html#ch_reconfig_upgrade . On Mon, Jun 4, 2018 at 5:40 PM, Sijie Guo <[email protected]> wrote: > Hi zookeeper team, > > > We hit an issue when upgrading from 3.4.x to 3.5.4-beta. Need some > helps/advices from the community. > > ``` > *10:14:55.607 [main] INFO org.apache.zookeeper.server. > NIOServerCnxnFactory > - binding to port 0.0.0.0/0.0.0.0:2181 <http://0.0.0.0/0.0.0.0:2181>* > *10:14:55.623 [main] ERROR > org.apache.pulsar.zookeeper.LocalBookkeeperEnsemble - Exception while > instantiating ZooKeeper* > *java.io.IOException: No snapshot found, but there are log entries. > Something is broken!* > * at > org.apache.zookeeper.server.persistence.FileTxnSnapLog. > restore(FileTxnSnapLog.java:206) > ~[pulsar-broker.jar:2.1.0-incubating-SNAPSHOT]* > * at > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:240) > ~[pulsar-broker.jar:2.1.0-incubating-SNAPSHOT]* > * at > org.apache.zookeeper.server.ZooKeeperServer.loadData( > ZooKeeperServer.java:284) > ~[pulsar-broker.jar:2.1.0-incubating-SNAPSHOT]* > * at > org.apache.zookeeper.server.ZooKeeperServer.startdata( > ZooKeeperServer.java:444) > ~[pulsar-broker.jar:2.1.0-incubating-SNAPSHOT]* > * at > org.apache.zookeeper.server.NIOServerCnxnFactory.startup( > NIOServerCnxnFactory.java:764) > ~[pulsar-broker.jar:2.1.0-incubating-SNAPSHOT]* > * at > org.apache.zookeeper.server.ServerCnxnFactory.startup( > ServerCnxnFactory.java:98) > ~[pulsar-broker.jar:2.1.0-incubating-SNAPSHOT]* > * at > org.apache.pulsar.zookeeper.LocalBookkeeperEnsemble.runZookeeper( > LocalBookkeeperEnsemble.java:126) > [pulsar-zookeeper-utils.jar:2.1.0-incubating-SNAPSHOT]* > * at > org.apache.pulsar.zookeeper.LocalBookkeeperEnsemble.startStandalone( > LocalBookkeeperEnsemble.java:242) > [pulsar-zookeeper-utils.jar:2.1.0-incubating-SNAPSHOT]* > * at > org.apache.pulsar.PulsarStandaloneStarter.start( > PulsarStandaloneStarter.java:171) > [pulsar-broker.jar:2.1.0-incubating-SNAPSHOT]* > * at > org.apache.pulsar.PulsarStandaloneStarter.main( > PulsarStandaloneStarter.java:266) > [pulsar-broker.jar:2.1.0-incubating-SNAPSHOT]* > ``` > > > Looking into the source code, > https://github.com/apache/zookeeper/blob/release-3.5.4/ > src/java/main/org/apache/zookeeper/server/persistence/ > FileTxnSnapLog.java#L206 > > A fix was introduced in https://issues.apache.org/ > jira/browse/ZOOKEEPER-2325 > to throw exception when there is no snapshots and txn log is not empty. > > I am just curious why the change was made in such way. my feeling in a > snapshotting-based store, if there is no snapshots but there are log > entries, it usually doesn't mean the state was corrupted. I guess I might > miss some context behind ZOOKEEPER-2325. > > > Also can you advice the steps for people who using 3.4.x to upgrade to > 3.5.4-beta? > > Thanks, > Sijie >
