Ramnatthan Alagappan created ZOOKEEPER-2560:
-----------------------------------------------

             Summary: Possible Cluster Unavailability
                 Key: ZOOKEEPER-2560
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2560
             Project: ZooKeeper
          Issue Type: Bug
          Components: server
         Environment: Three node linux cluster
            Reporter: Ramnatthan Alagappan
             Fix For: 3.4.8


Possible Cluster Unvailability

I am running a three node ZooKeeper cluster. Each node runs Linux. 

I see the below sequence of system calls when ZooKeeper appends a user data 
item to the log file.

1 write("/data/version-2/log.200000001", offset=65, count=12)
2 write("/data/version-2/log.200000001", offset=77, count=16323)
3 write("/data/version-2/log.200000001", offset=16400, count=4209)
4 write("/data/version-2/log.200000001", offset=20609, count=1)
5 fdatasync("/data//version-2/log.200000001")

Now, a crash could happen just after operation 4 but before the final 
fdatasync. In this situation, the file system could persist the 4th operation 
and fail to persist the 3rd operation because of the crash and there is fsync 
in between them. In such cases, ZooKeeper server fails to start with the 
following messages in its log file:

[myid:] - INFO  [main:QuorumPeerConfig@103] - Reading configuration from: 
/tmp/zoo2.cfg
[myid:] - INFO  [main:QuorumPeer$QuorumServer@149] - Resolved hostname: 
127.0.0.2 to address: /127.0.0.2
[myid:] - INFO  [main:QuorumPeer$QuorumServer@149] - Resolved hostname: 
127.0.0.4 to address: /127.0.0.4
[myid:] - INFO  [main:QuorumPeer$QuorumServer@149] - Resolved hostname: 
127.0.0.3 to address: /127.0.0.3
[myid:] - INFO  [main:QuorumPeerConfig@331] - Defaulting to majority quorums
[myid:1] - INFO  [main:DatadirCleanupManager@78] - autopurge.snapRetainCount 
set to 3
[myid:1] - INFO  [main:DatadirCleanupManager@79] - autopurge.purgeInterval set 
to 0
[myid:1] - INFO  [main:DatadirCleanupManager@101] - Purge task is not scheduled.
[myid:1] - INFO  [main:QuorumPeerMain@127] - Starting quorum peer
[myid:1] - INFO  [main:NIOServerCnxnFactory@89] - binding to port 
0.0.0.0/0.0.0.0:2182
[myid:1] - INFO  [main:QuorumPeer@1019] - tickTime set to 2000
[myid:1] - INFO  [main:QuorumPeer@1039] - minSessionTimeout set to -1
[myid:1] - INFO  [main:QuorumPeer@1050] - maxSessionTimeout set to -1
[myid:1] - INFO  [main:QuorumPeer@1065] - initLimit set to 5
[myid:1] - INFO  [main:FileSnap@83] - Reading snapshot 
/data/version-2/snapshot.100000002
[myid:1] - ERROR [main:QuorumPeer@557] - Unable to load database on disk
java.io.IOException: CRC check failed
        at 
org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:635)
        at 
org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:158)
        at 
org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
        at 
org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:510)
        at 
org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:500)
        at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:153)
        at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111)
        at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
2016-04-15 04:00:32,795 [myid:1] - ERROR [main:QuorumPeerMain@89] - Unexpected 
exception, exiting abnormally
java.lang.RuntimeException: Unable to run quorum server 
        at 
org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:558)
        at 
org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:500)
        at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:153)
        at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111)
        at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
Caused by: java.io.IOException: CRC check failed
        at 
org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:635)
        at 
org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:158)
        at 
org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
        at 
org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:510)
        ... 4 more

The same happens when the 3rd and 4th writes hit the disk but the 2nd operation 
does not. 

Now, two nodes of a three node cluster can easily reach this state, rendering 
the entire cluster unavailable. ZooKeeper, on recovery should be able to handle 
such checksum mismatches gracefully to maintain cluster availability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to