[
https://issues.apache.org/jira/browse/ZOOKEEPER-3826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17136721#comment-17136721
]
Aaron Bandt commented on ZOOKEEPER-3826:
----------------------------------------
I understand that this is desired behavior to ensure data consistency but it
really presents a problem when upgrading. We have low volume development and qa
clusters that have been up for days with only one ZooKeeper having a snapshot.
Manually copying files between ZooKeepers or having to wait an unknown amount
of time before ZooKeeper will create a snapshot doesn't work well in an
environment where the configuration is managed by automation.
It seems that since the new version requires a snapshot to be present, it would
make sense for it to create one immediately upon successful startup, especially
since keeping {{snapshot.trust.empty}} in the config for any length of time is
undesirable. ZooKeeper seems to do this on a fresh installation of the latest
version, so I'm not sure why it doesn't on an upgrade?
> upgrade from 3.4.x to 3.5.x
> ---------------------------
>
> Key: ZOOKEEPER-3826
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3826
> Project: ZooKeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.5.7
> Environment: Kuberenetes
> Reporter: Aldan Brito
> Priority: Critical
>
> upgrade of zookeeper from 3.4.14 to 3.5.7
> We faced the snapshot issue which is described in
> https://issues.apache.org/jira/browse/ZOOKEEPER-3056
> After setting the property "snapshot.trust.empty=true" the upgrade was
> successful.
> while reverting the "snapshot.trust.empty=false" flag and restart of the
> zookeeper pods, one of the zookeeper server is failing with the similar stack
> trace no snapshot found.
> {code:java}
> {"type":"log", "host":"zk-testzk-0", "level":"ERROR",
> "neid":"zookeeper-4636c00bfc3849e0be179bc71cef17f8", "system":"zookeeper",
> "time":"2020-05-12T08:32:17.685Z", "timezone":"UTC", "log":{"message":"main -
> org.apache.zookeeper.server.quorum.QuorumPeer - Unable to load database on
> disk"}}
> java.io.IOException: No snapshot found, but there are log entries. Something
> is broken!
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:240)
> at
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:240)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:901)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:887)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:205)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:123)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:82)
> {"type":"log", "host":"zk-testzk-0", "level":"ERROR",
> "neid":"zookeeper-4636c00bfc3849e0be179bc71cef17f8", "system":"zookeeper",
> "time":"2020-05-12T08:32:17.764Z", "timezone":"UTC", "log":{"message":"main -
> org.apache.zookeeper.server.quorum.QuorumPeerMain - Unexpected exception,
> exiting abnormally"}}
> java.lang.RuntimeException: Unable to run quorum server
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:938)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:887)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:205)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:123)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:82)
> Caused by: java.io.IOException: No snapshot found, but there are log entries.
> Something is broken!
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:240)
> at
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:240)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:901)
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)