[jira] [Commented] (HDFS-13977) NameNode can kill itself if it tries to send too many txns to a QJM simultaneously

Chen Liang (Jira) Mon, 26 Aug 2019 14:37:22 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-13977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16916156#comment-16916156
 ]


Chen Liang commented on HDFS-13977:
-----------------------------------

Thanks for checking [~xkrogen]. I've committed the branch-2 patch.

> NameNode can kill itself if it tries to send too many txns to a QJM 
> simultaneously
> ----------------------------------------------------------------------------------
>
>                 Key: HDFS-13977
>                 URL: https://issues.apache.org/jira/browse/HDFS-13977
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode, qjm
>    Affects Versions: 2.7.7
>            Reporter: Erik Krogen
>            Assignee: Erik Krogen
>            Priority: Major
>             Fix For: 3.3.0, 3.2.1
>
>         Attachments: HDFS-13977-branch-2.003.patch, HDFS-13977.000.patch, 
> HDFS-13977.001.patch, HDFS-13977.002.patch, HDFS-13977.003.patch
>
>
> h3. Problem & Logs
> We recently encountered an issue on a large cluster (running 2.7.4) in which 
> the NameNode killed itself because it was unable to communicate with the JNs 
> via QJM. We discovered that it was the result of the NameNode trying to send 
> a huge batch of over 1 million transactions to the JNs in a single RPC:
> {code:title=NameNode Logs}
> WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Remote 
> journal X.X.X.X:XXXX failed to
>  write txns 10000000-11153636. Will try to write to this JN again after the 
> next log roll.
> ...
> WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Took 1098ms 
> to send a batch of 1153637 edits (335886611 bytes) to remote journal 
> X.X.X.X:XXXX
> {code}
> {code:title=JournalNode Logs}
> INFO org.apache.hadoop.ipc.Server: Socket Reader #1 for port 8485: 
> readAndProcess from client X.X.X.X threw exception [java.io.IOException: 
> Requested data length 335886776 is longer than maximum configured RPC length 
> 67108864.  RPC came from X.X.X.X]
> java.io.IOException: Requested data length 335886776 is longer than maximum 
> configured RPC length 67108864.  RPC came from X.X.X.X
>         at 
> org.apache.hadoop.ipc.Server$Connection.checkDataLength(Server.java:1610)
>         at 
> org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1672)
>         at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:897)
>         at 
> org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:753)
>         at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:724)
> {code}
> The JournalNodes rejected the RPC because it had a size well over the 64MB 
> default {{ipc.maximum.data.length}}.
> This was triggered by a huge number of files all hitting a hard lease timeout 
> simultaneously, causing the NN to force-close them all at once. This can be a 
> particularly nasty bug as the NN will attempt to re-send this same huge RPC 
> on restart, as it loads an fsimage which still has all of these open files 
> that need to be force-closed.
> h3. Proposed Solution
> To solve this we propose to modify {{EditsDoubleBuffer}} to add a "hard 
> limit" based on the value of {{ipc.maximum.data.length}}. When {{writeOp()}} 
> or {{writeRaw()}} is called, first check the size of {{bufCurrent}}. If it 
> exceeds the hard limit, block the writer until the buffer is flipped and 
> {{bufCurrent}} becomes {{bufReady}}. This gives some self-throttling to 
> prevent the NameNode from killing itself in this way.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-13977) NameNode can kill itself if it tries to send too many txns to a QJM simultaneously

Reply via email to