Ramnatthan Alagappan created ZOOKEEPER-2495:
-----------------------------------------------

             Summary: Cluster unavailable on disk full(ENOSPC), disk 
quota(EDQUOT), disk write error(EIO) errors
                 Key: ZOOKEEPER-2495
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2495
             Project: ZooKeeper
          Issue Type: Bug
          Components: leaderElection, server
    Affects Versions: 3.4.8
         Environment: Normal ZooKeeper cluster with 3 Linux nodes.
            Reporter: Ramnatthan Alagappan


ZooKeeper cluster completely stalls with *no* transactions making progress when 
a storage related error (such as *ENOSPC, EDQUOT, EIO*) is encountered by the 
current *leader*. 

Surprisingly, the same errors in some circumstances cause the node to 
completely crash and therefore allowing other nodes in the cluster to become 
the leader and make progress with transactions. Interestingly, the same errors 
if encountered while initializing a new log file causes the current leader to 
go to weird state (but does not crash) where it thinks it is the leader (and so 
does not allow others to become the leader). *This causes the entire cluster to 
freeze. *

Here is the stacktrace of the leader:

------------------------------------------------

2016-07-11 15:42:27,502 [myid:3] - INFO  [SyncThread:3:FileTxnLog@199] - 
Creating new log file: log.200000001
2016-07-11 15:42:27,505 [myid:3] - ERROR 
[SyncThread:3:ZooKeeperCriticalThread@49] - Severe unrecoverable error, from 
thread : SyncThread:3
java.io.IOException: Disk quota exceeded
        at java.io.FileOutputStream.writeBytes(Native Method)
        at java.io.FileOutputStream.write(FileOutputStream.java:345)
        at 
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
        at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
        at 
org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:211)
        at 
org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:314)
        at org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:476)
        at 
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:140)

------------------------------------------------

>From the trace and the code, it looks like the problem happens only when a new 
>log file is initialized and only when there are errors in two cases:

1. Error during the append of *log header*.
2. Error during *padding zero bytes to the end of the log*.
 
If similar errors happen when writing some other blocks of data, then the node 
just completely crashes allowing others to be elected as a new leader. These 
two blocks of the newly created log file are special as they take a different 
error recovery code path -- the node does not completely crash but rather 
certain threads are killed but supposedly the quorum holding thread stays up 
thereby preventing others to become the new leader.  This causes the other 
nodes to think that there is no problem with the leader but the cluster just 
becomes unavailable for any subsequent operations such as read/write. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to