[
https://issues.apache.org/jira/browse/ZOOKEEPER-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jiafu Jiang updated ZOOKEEPER-3220:
-----------------------------------
Summary: The snapshot is not saved to disk and may cause data
inconsistency. (was: Snapshot is not written to disk and cause data
inconsistency.)
> The snapshot is not saved to disk and may cause data inconsistency.
> -------------------------------------------------------------------
>
> Key: ZOOKEEPER-3220
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3220
> Project: ZooKeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.4.12, 3.4.13
> Reporter: Jiafu Jiang
> Priority: Critical
>
> We known that ZooKeeper server will call fsync to make sure that log data has
> been successfully saved to disk. But ZooKeeper server does not call fsync to
> make sure that a snapshot has been successfully saved, which may cause
> potential problems. Since a close to a file description does not make sure
> that data is written to disk, see
> [http://man7.org/linux/man-pages/man2/close.2.html#notes] for more details.
>
> If the snapshot is not successfully saved to disk, it may lead to data
> inconsistency. Here is my example, which is also a real problem I have ever
> met.
> 1. I deployed a 3-node ZooKeeper cluster: zk1, zk2, and zk3, zk2 was the
> leader.
> 2. Both zk1 and zk2 had the log records from log1~logX, X is the zxid.
> 3. The machine of zk1 restarted, and during the reboot, log(X+1) ~ log Y are
> saved to log files of both zk2(leader) and zk3(follower).
> 4. After zk1 restarted successfully, it found itself to be a follower, and it
> began to synchronize log with the leader. The leader sent a snapshot(records
> from log 1 ~ log Y) to zk1, zk1 saved the snapshot to local disk by calling
> the method ZooKeeperServer.takeSnapshot. But unfortunately, when the method
> returned, the snapshot data was not saved to disk yet. If fact the snapshot
> file was created, but the size was 0.
> 5. zk1 finished the synchronization and began to accept new request from the
> leader. Say log(Y + 1) ~ log Z was accepted by zk1 and saved to log file.
> With fsync zk1 can make sure log data is not lost.
> 6. zk1 restarted again. Since the snapshot's size was 0, it would not be
> used, therefore zk1 recovered using the log files. But the records from
> log(X+1) ~ logY were lost !
>
> Sorry for my poor English.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)