Ramnatthan Alagappan created ZOOKEEPER-2562:
-----------------------------------------------

             Summary: Safely persist renames of *epoch.tmp to *epoch by issuing 
fsync on parent directory -- Possible cluster unavailability otherwise
                 Key: ZOOKEEPER-2562
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2562
             Project: ZooKeeper
          Issue Type: Bug
         Environment: Three node linux cluster
            Reporter: Ramnatthan Alagappan
             Fix For: 3.4.7


I am running a three node ZooKeeper cluster. 

Renames of acceptedEpoch.tmp to acceptedEpoch and currentEpoch.tmp to 
currentEpoch have to persisted to disk by explicitly issuing fsync on the 
parent directory. If not, the rename might not hit the disk immediately and if 
a crash occurs at this point, then the server would fail to start with the 
following error in the log. If this happens on two more or nodes, then the 
cluster can become unavailable.   

[myid:] - INFO  [main:QuorumPeerConfig@103] - Reading configuration from: 
/tmp/zoo2.cfg
[myid:] - INFO  [main:QuorumPeer$QuorumServer@149] - Resolved hostname: 
127.0.0.2 to address: /127.0.0.2
[myid:] - INFO  [main:QuorumPeer$QuorumServer@149] - Resolved hostname: 
127.0.0.4 to address: /127.0.0.4
[myid:] - INFO  [main:QuorumPeer$QuorumServer@149] - Resolved hostname: 
127.0.0.3 to address: /127.0.0.3
[myid:] - INFO  [main:QuorumPeerConfig@331] - Defaulting to majority quorums
[myid:1] - INFO  [main:DatadirCleanupManager@78] - autopurge.snapRetainCount 
set to 3
[myid:1] - INFO  [main:DatadirCleanupManager@79] - autopurge.purgeInterval set 
to 0
[myid:1] - INFO  [main:DatadirCleanupManager@101] - Purge task is not scheduled.
[myid:1] - INFO  [main:QuorumPeerMain@127] - Starting quorum peer
[myid:1] - INFO  [main:NIOServerCnxnFactory@89] - binding to port 
0.0.0.0/0.0.0.0:2182
[myid:1] - INFO  [main:QuorumPeer@1019] - tickTime set to 2000
[myid:1] - INFO  [main:QuorumPeer@1039] - minSessionTimeout set to -1
[myid:1] - INFO  [main:QuorumPeer@1050] - maxSessionTimeout set to -1
[myid:1] - INFO  [main:QuorumPeer@1065] - initLimit set to 5
[myid:1] - INFO  [main:FileSnap@83] - Reading snapshot 
/run/shm/dice-4636/113-98-129-z_majority_RO_OM_0=60_1=55/rdir-0/version-2/snapshot.100000002
[myid:1] - ERROR [main:QuorumPeer@557] - Unable to load database on disk
java.io.IOException: The accepted epoch, 1 is less than the current epoch, 2
        at 
org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:554)
        at 
org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:500)
        at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:153)
        at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111)
        at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
2016-04-15 03:24:57,144 [myid:1] - ERROR [main:QuorumPeerMain@89] - Unexpected 
exception, exiting abnormally
java.lang.RuntimeException: Unable to run quorum server 
        at 
org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:558)
        at 
org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:500)
        at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:153)
        at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111)
        at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
Caused by: java.io.IOException: The accepted epoch, 1 is less than the current 
epoch, 2
        at 
org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:554)
        ... 4 more

Similarly, when new log file is created, the parent directory needs be 
explicitly fsynced to persist the log file. Otherwise a data loss might be 
possible (We have reproduced the above issues).  Please see this: 
https://www.quora.com/Linux/When-should-you-fsync-the-containing-directory-in-addition-to-the-file-itself
 and http://research.cs.wisc.edu/wind/Publications/alice-osdi14.pdf. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to