Erik Krogen created HDFS-13977:
----------------------------------

             Summary: NameNode can kill itself if it tries to send too many 
txns to a QJM simultaneously
                 Key: HDFS-13977
                 URL: https://issues.apache.org/jira/browse/HDFS-13977
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: namenode, qjm
    Affects Versions: 2.7.7
            Reporter: Erik Krogen
            Assignee: Erik Krogen


h3. Problem & Logs
We recently encountered an issue on a large cluster (running 2.7.4) in which 
the NameNode killed itself because it was unable to communicate with the JNs 
via QJM. We discovered that it was the result of the NameNode trying to send a 
huge batch of over 1 million transactions to the JNs in a single RPC:
{code:title=NameNode Logs}
WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Remote 
journal X.X.X.X:XXXX failed to
 write txns 10000000-11153636. Will try to write to this JN again after the 
next log roll.
...
WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Took 1098ms 
to send a batch of 1153637 edits (335886611 bytes) to remote journal 
X.X.X.X:XXXX
{code}
{code:title=JournalNode Logs}
INFO org.apache.hadoop.ipc.Server: Socket Reader #1 for port 8485: 
readAndProcess from client X.X.X.X threw exception [java.io.IOException: 
Requested data length 335886776 is longer than maximum configured RPC length 
67108864.  RPC came from X.X.X.X]
java.io.IOException: Requested data length 335886776 is longer than maximum 
configured RPC length 67108864.  RPC came from X.X.X.X
        at 
org.apache.hadoop.ipc.Server$Connection.checkDataLength(Server.java:1610)
        at 
org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1672)
        at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:897)
        at 
org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:753)
        at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:724)
{code}
The JournalNodes rejected the RPC because it had a size well over the 64MB 
default {{ipc.maximum.data.length}}.

This was triggered by a huge number of files all hitting a hard lease timeout 
simultaneously, causing the NN to force-close them all at once. This can be a 
particularly nasty bug as the NN will attempt to re-send this same huge RPC on 
restart, as it loads an fsimage which still has all of these open files that 
need to be force-closed.

h3. Proposed Solution
To solve this we propose to modify {{EditsDoubleBuffer}} to add a "hard limit" 
based on the value of {{ipc.maximum.data.length}}. When {{writeOp()}} or 
{{writeRaw()}} is called, first check the size of {{bufCurrent}}. If it exceeds 
the hard limit, block the writer until the buffer is flipped and {{bufCurrent}} 
becomes {{bufReady}}. This gives some self-throttling to prevent the NameNode 
from killing itself in this way.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to