[ 
https://issues.apache.org/jira/browse/HDFS-13977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16913468#comment-16913468
 ] 

Erik Krogen commented on HDFS-13977:
------------------------------------

Good catch on the "less than" [~vagarychen]! Thanks. The check is in a method 
{{setOutputBufferSize()}}, but I checked and you're right that this is only 
ever modified from the hard-coded 512K in tests. I changed the wording slightly 
to make the message more clear in v003.

> NameNode can kill itself if it tries to send too many txns to a QJM 
> simultaneously
> ----------------------------------------------------------------------------------
>
>                 Key: HDFS-13977
>                 URL: https://issues.apache.org/jira/browse/HDFS-13977
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode, qjm
>    Affects Versions: 2.7.7
>            Reporter: Erik Krogen
>            Assignee: Erik Krogen
>            Priority: Major
>         Attachments: HDFS-13977.000.patch, HDFS-13977.001.patch, 
> HDFS-13977.002.patch, HDFS-13977.003.patch
>
>
> h3. Problem & Logs
> We recently encountered an issue on a large cluster (running 2.7.4) in which 
> the NameNode killed itself because it was unable to communicate with the JNs 
> via QJM. We discovered that it was the result of the NameNode trying to send 
> a huge batch of over 1 million transactions to the JNs in a single RPC:
> {code:title=NameNode Logs}
> WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Remote 
> journal X.X.X.X:XXXX failed to
>  write txns 10000000-11153636. Will try to write to this JN again after the 
> next log roll.
> ...
> WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Took 1098ms 
> to send a batch of 1153637 edits (335886611 bytes) to remote journal 
> X.X.X.X:XXXX
> {code}
> {code:title=JournalNode Logs}
> INFO org.apache.hadoop.ipc.Server: Socket Reader #1 for port 8485: 
> readAndProcess from client X.X.X.X threw exception [java.io.IOException: 
> Requested data length 335886776 is longer than maximum configured RPC length 
> 67108864.  RPC came from X.X.X.X]
> java.io.IOException: Requested data length 335886776 is longer than maximum 
> configured RPC length 67108864.  RPC came from X.X.X.X
>         at 
> org.apache.hadoop.ipc.Server$Connection.checkDataLength(Server.java:1610)
>         at 
> org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1672)
>         at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:897)
>         at 
> org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:753)
>         at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:724)
> {code}
> The JournalNodes rejected the RPC because it had a size well over the 64MB 
> default {{ipc.maximum.data.length}}.
> This was triggered by a huge number of files all hitting a hard lease timeout 
> simultaneously, causing the NN to force-close them all at once. This can be a 
> particularly nasty bug as the NN will attempt to re-send this same huge RPC 
> on restart, as it loads an fsimage which still has all of these open files 
> that need to be force-closed.
> h3. Proposed Solution
> To solve this we propose to modify {{EditsDoubleBuffer}} to add a "hard 
> limit" based on the value of {{ipc.maximum.data.length}}. When {{writeOp()}} 
> or {{writeRaw()}} is called, first check the size of {{bufCurrent}}. If it 
> exceeds the hard limit, block the writer until the buffer is flipped and 
> {{bufCurrent}} becomes {{bufReady}}. This gives some self-throttling to 
> prevent the NameNode from killing itself in this way.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to