[
https://issues.apache.org/jira/browse/ZOOKEEPER-4425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17459705#comment-17459705
]
Mate Szalay-Beko commented on ZOOKEEPER-4425:
---------------------------------------------
I see some points why this would be useful, but on the other hand this will not
solve the "problem" described above. The snapshots are not enough by themselves
to restore the state, you always need the snapshots and the write-ahead-logs
(aka. "transaction logs"). See:
https://zookeeper.apache.org/doc/r3.7.0/zookeeperAdmin.html#sc_dataFileManagement
bq. The snapshot files stored in the data directory are fuzzy snapshots in the
sense that during the time the ZooKeeper server is taking the snapshot, updates
are occurring to the data tree. The suffix of the snapshot file names is the
zxid, the ZooKeeper transaction id, of the last committed transaction at the
start of the snapshot. Thus, the snapshot includes a subset of the updates to
the data tree that occurred while the snapshot was in process. The snapshot,
then, may not correspond to any data tree that actually existed, and for this
reason we refer to it as a fuzzy snapshot. Still, ZooKeeper can recover using
this snapshot because it takes advantage of the idempotent nature of its
updates. By replaying the transaction log against fuzzy snapshots ZooKeeper
gets the state of the system at the end of the log.
So in general, if you need to restore ZooKeeper, then simply you need to backup
the last snapshot, plus the transaction logs that were flushed since the time
of the snapshot creation start. (or usually just to be on the safe side, you
take the last 2-3 snapshots, plus the related transaction logs, to be sure that
you can recover even if the latest snapshot is corrupted for some reason).
Forcing the generation of a snapshot doesn't really change this, you will still
need the transaction logs to be able to restore. Of course, the more recent the
snapshots are, the less number of transaction logs you need and in general the
recovery (startup time) can be quicker.
> 4lw Command: On demand snapshot
> -------------------------------
>
> Key: ZOOKEEPER-4425
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4425
> Project: ZooKeeper
> Issue Type: New Feature
> Components: server
> Reporter: Pablo Francisco Pérez Hidalgo
> Priority: Minor
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Working with disaster recovery scenarios at work, we found that having the
> capacity of telling a ZooKeeper instance to take a snapshot, thus dumping
> into files the contents of its view of the internal database, could be a last
> resource hatch out of potential data loss.
> As an example, imagine that all the voting members of the ensemble are wiped
> out due a wrong deployment configuration change. A single surviving observer
> could hold on its memory the last copy of most recently updated the ensemble
> data. Sending it a _*snap*_ *four letter words command* that forced it to
> save a snapshot of that information into disk could be a very convenient way
> of recovering the database.
>
> This issue aims to discuss the addition of this feature and serve as the gate
> for a an already available patch providing this feature.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)