[jira] [Commented] (ZOOKEEPER-4425) 4lw Command: On demand snapshot

Mate Szalay-Beko (Jira) Tue, 14 Dec 2021 23:25:05 -0800


    [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17459705#comment-17459705
 ]


Mate Szalay-Beko commented on ZOOKEEPER-4425:
---------------------------------------------

I see some points why this would be useful, but on the other hand this will not 
solve the "problem" described above. The snapshots are not enough by themselves 
to restore the state, you always need the snapshots and the write-ahead-logs 
(aka. "transaction logs"). See: 
https://zookeeper.apache.org/doc/r3.7.0/zookeeperAdmin.html#sc_dataFileManagement

bq. The snapshot files stored in the data directory are fuzzy snapshots in the 
sense that during the time the ZooKeeper server is taking the snapshot, updates 
are occurring to the data tree. The suffix of the snapshot file names is the 
zxid, the ZooKeeper transaction id, of the last committed transaction at the 
start of the snapshot. Thus, the snapshot includes a subset of the updates to 
the data tree that occurred while the snapshot was in process. The snapshot, 
then, may not correspond to any data tree that actually existed, and for this 
reason we refer to it as a fuzzy snapshot. Still, ZooKeeper can recover using 
this snapshot because it takes advantage of the idempotent nature of its 
updates. By replaying the transaction log against fuzzy snapshots ZooKeeper 
gets the state of the system at the end of the log.

So in general, if you need to restore ZooKeeper, then simply you need to backup 
the last snapshot, plus the transaction logs that were flushed since the time 
of the snapshot creation start. (or usually just to be on the safe side, you 
take the last 2-3 snapshots, plus the related transaction logs, to be sure that 
you can recover even if the latest snapshot is corrupted for some reason).

Forcing the generation of a snapshot doesn't really change this, you will still 
need the transaction logs to be able to restore. Of course, the more recent the 
snapshots are, the less number of transaction logs you need and in general the 
recovery (startup time) can be quicker.


> 4lw Command: On demand snapshot
> -------------------------------
>
>                 Key: ZOOKEEPER-4425
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4425
>             Project: ZooKeeper
>          Issue Type: New Feature
>          Components: server
>            Reporter: Pablo Francisco Pérez Hidalgo
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Working with disaster recovery scenarios at work, we found that having the 
> capacity of telling a ZooKeeper instance to take a snapshot, thus dumping 
> into files the contents of its view of the internal database, could be a last 
> resource hatch out of potential data loss.
> As an example, imagine that all the voting members of the ensemble are wiped 
> out due a wrong deployment configuration change. A single surviving observer 
> could hold on its memory the last copy of most recently updated the ensemble 
> data. Sending it a _*snap*_ *four letter words command* that forced it to 
> save a snapshot of that information into disk could be a very convenient way 
> of recovering the database.
>  
> This issue aims to discuss the addition of this feature and serve as the gate 
> for a an already available patch providing this feature.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (ZOOKEEPER-4425) 4lw Command: On demand snapshot

Reply via email to