[
https://issues.apache.org/jira/browse/ZOOKEEPER-3826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105321#comment-17105321
]
Mate Szalay-Beko commented on ZOOKEEPER-3826:
---------------------------------------------
can you check the file system to see if there is any snapshot file present on
the node in question?
If there is no snapshot file, then (with disabled {{snapshot.trust.empty}})
this is not a bug, but this is the expected behaviour is to fail to start.
Please note, in ZooKeeper all the servers are taking snapshots without
synchronizing with each other, so it is totally possible that there is a server
without snapshot, while other servers already has some. Please check our admin
guide for snapshotting related parameters / details:
https://zookeeper.apache.org/doc/r3.6.1/zookeeperAdmin.html
You have many ways to avoid this situation:
- wait until all servers has a snapshot file, before disabling
{{snapshot.trust.empty}})
- copy the snapshots and log files from the leader before starting up the
server which has no snapshot (make sure you copy both snapshots and logs, as
due to the fuzzy-snapshotting, they both needed for a consistent view - see
https://zookeeper.apache.org/doc/r3.6.1/zookeeperAdmin.html#sc_dataFileManagement)
- play with the {{snapCount}} or {{snapSizeLimitInKb}} parameters to instruct
ZooKeeper to take snapshots more frequently (at least during the period when
you still have {{snapshot.trust.empty=true}} )
Enforcing the taking of a snapshot with an admin command might be a good
improvement (I am not sure if there is any feature like this in ZooKeeper right
now). Maybe others know more...
> upgrade from 3.4.x to 3.5.x
> ---------------------------
>
> Key: ZOOKEEPER-3826
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3826
> Project: ZooKeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.5.7
> Environment: Kuberenetes
> Reporter: Aldan Brito
> Priority: Critical
>
> upgrade of zookeeper from 3.4.14 to 3.5.7
> We faced the snapshot issue which is described in
> https://issues.apache.org/jira/browse/ZOOKEEPER-3056
> After setting the property "snapshot.trust.empty=true" the upgrade was
> successful.
> while reverting the "snapshot.trust.empty=false" flag and restart of the
> zookeeper pods, one of the zookeeper server is failing with the similar stack
> trace no snapshot found.
> {code:java}
> {"type":"log", "host":"zk-testzk-0", "level":"ERROR",
> "neid":"zookeeper-4636c00bfc3849e0be179bc71cef17f8", "system":"zookeeper",
> "time":"2020-05-12T08:32:17.685Z", "timezone":"UTC", "log":{"message":"main -
> org.apache.zookeeper.server.quorum.QuorumPeer - Unable to load database on
> disk"}}
> java.io.IOException: No snapshot found, but there are log entries. Something
> is broken!
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:240)
> at
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:240)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:901)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:887)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:205)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:123)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:82)
> {"type":"log", "host":"zk-testzk-0", "level":"ERROR",
> "neid":"zookeeper-4636c00bfc3849e0be179bc71cef17f8", "system":"zookeeper",
> "time":"2020-05-12T08:32:17.764Z", "timezone":"UTC", "log":{"message":"main -
> org.apache.zookeeper.server.quorum.QuorumPeerMain - Unexpected exception,
> exiting abnormally"}}
> java.lang.RuntimeException: Unable to run quorum server
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:938)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:887)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:205)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:123)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:82)
> Caused by: java.io.IOException: No snapshot found, but there are log entries.
> Something is broken!
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:240)
> at
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:240)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:901)
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)