[
https://issues.apache.org/jira/browse/HDFS-13977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16916156#comment-16916156
]
Chen Liang commented on HDFS-13977:
-----------------------------------
Thanks for checking [~xkrogen]. I've committed the branch-2 patch.
> NameNode can kill itself if it tries to send too many txns to a QJM
> simultaneously
> ----------------------------------------------------------------------------------
>
> Key: HDFS-13977
> URL: https://issues.apache.org/jira/browse/HDFS-13977
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode, qjm
> Affects Versions: 2.7.7
> Reporter: Erik Krogen
> Assignee: Erik Krogen
> Priority: Major
> Fix For: 3.3.0, 3.2.1
>
> Attachments: HDFS-13977-branch-2.003.patch, HDFS-13977.000.patch,
> HDFS-13977.001.patch, HDFS-13977.002.patch, HDFS-13977.003.patch
>
>
> h3. Problem & Logs
> We recently encountered an issue on a large cluster (running 2.7.4) in which
> the NameNode killed itself because it was unable to communicate with the JNs
> via QJM. We discovered that it was the result of the NameNode trying to send
> a huge batch of over 1 million transactions to the JNs in a single RPC:
> {code:title=NameNode Logs}
> WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Remote
> journal X.X.X.X:XXXX failed to
> write txns 10000000-11153636. Will try to write to this JN again after the
> next log roll.
> ...
> WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Took 1098ms
> to send a batch of 1153637 edits (335886611 bytes) to remote journal
> X.X.X.X:XXXX
> {code}
> {code:title=JournalNode Logs}
> INFO org.apache.hadoop.ipc.Server: Socket Reader #1 for port 8485:
> readAndProcess from client X.X.X.X threw exception [java.io.IOException:
> Requested data length 335886776 is longer than maximum configured RPC length
> 67108864. RPC came from X.X.X.X]
> java.io.IOException: Requested data length 335886776 is longer than maximum
> configured RPC length 67108864. RPC came from X.X.X.X
> at
> org.apache.hadoop.ipc.Server$Connection.checkDataLength(Server.java:1610)
> at
> org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1672)
> at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:897)
> at
> org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:753)
> at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:724)
> {code}
> The JournalNodes rejected the RPC because it had a size well over the 64MB
> default {{ipc.maximum.data.length}}.
> This was triggered by a huge number of files all hitting a hard lease timeout
> simultaneously, causing the NN to force-close them all at once. This can be a
> particularly nasty bug as the NN will attempt to re-send this same huge RPC
> on restart, as it loads an fsimage which still has all of these open files
> that need to be force-closed.
> h3. Proposed Solution
> To solve this we propose to modify {{EditsDoubleBuffer}} to add a "hard
> limit" based on the value of {{ipc.maximum.data.length}}. When {{writeOp()}}
> or {{writeRaw()}} is called, first check the size of {{bufCurrent}}. If it
> exceeds the hard limit, block the writer until the buffer is flipped and
> {{bufCurrent}} becomes {{bufReady}}. This gives some self-throttling to
> prevent the NameNode from killing itself in this way.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]