[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105321#comment-17105321
 ] 

Mate Szalay-Beko commented on ZOOKEEPER-3826:
---------------------------------------------

can you check the file system to see if there is any snapshot file present on 
the node in question?

If there is no snapshot file, then (with disabled {{snapshot.trust.empty}}) 
this is not a bug, but this is the expected behaviour is to fail to start. 

Please note, in ZooKeeper all the servers are taking snapshots without 
synchronizing with each other, so it is totally possible that there is a server 
without snapshot, while other servers already has some. Please check our admin 
guide for snapshotting related parameters / details: 
https://zookeeper.apache.org/doc/r3.6.1/zookeeperAdmin.html

You have many ways to avoid this situation:
- wait until all servers has a snapshot file, before disabling 
{{snapshot.trust.empty}})
- copy the snapshots and log files from the leader before starting up the 
server which has no snapshot (make sure you copy both snapshots and logs, as 
due to the fuzzy-snapshotting, they both needed for a consistent view - see 
https://zookeeper.apache.org/doc/r3.6.1/zookeeperAdmin.html#sc_dataFileManagement)
- play with the {{snapCount}} or {{snapSizeLimitInKb}} parameters to instruct 
ZooKeeper to take snapshots more frequently (at least during the period when 
you still have {{snapshot.trust.empty=true}} )

Enforcing the taking of a snapshot with an admin command might be a good 
improvement (I am not sure if there is any feature like this in ZooKeeper right 
now). Maybe others know more...

> upgrade from 3.4.x to 3.5.x
> ---------------------------
>
>                 Key: ZOOKEEPER-3826
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3826
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.5.7
>         Environment: Kuberenetes 
>            Reporter: Aldan Brito
>            Priority: Critical
>
> upgrade of zookeeper from 3.4.14 to 3.5.7 
> We faced the snapshot issue which is described in 
> https://issues.apache.org/jira/browse/ZOOKEEPER-3056
> After setting the property "snapshot.trust.empty=true" the upgrade was 
> successful.
> while reverting the "snapshot.trust.empty=false" flag and restart of the 
> zookeeper pods, one of the zookeeper server is failing with the similar stack 
> trace no snapshot  found.
> {code:java}
> {"type":"log", "host":"zk-testzk-0", "level":"ERROR", 
> "neid":"zookeeper-4636c00bfc3849e0be179bc71cef17f8", "system":"zookeeper", 
> "time":"2020-05-12T08:32:17.685Z", "timezone":"UTC", "log":{"message":"main - 
> org.apache.zookeeper.server.quorum.QuorumPeer - Unable to load database on 
> disk"}}
> java.io.IOException: No snapshot found, but there are log entries. Something 
> is broken!
>         at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:240)
>         at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:240)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:901)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:887)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:205)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:123)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:82)
> {"type":"log", "host":"zk-testzk-0", "level":"ERROR", 
> "neid":"zookeeper-4636c00bfc3849e0be179bc71cef17f8", "system":"zookeeper", 
> "time":"2020-05-12T08:32:17.764Z", "timezone":"UTC", "log":{"message":"main - 
> org.apache.zookeeper.server.quorum.QuorumPeerMain - Unexpected exception, 
> exiting abnormally"}}
> java.lang.RuntimeException: Unable to run quorum server
>         at 
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:938)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:887)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:205)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:123)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:82)
> Caused by: java.io.IOException: No snapshot found, but there are log entries. 
> Something is broken!
>         at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:240)
>         at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:240)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:901)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to