[jira] [Commented] (HDFS-6967) DNs may OOM under high webhdfs load

Daryn Sharp (JIRA) Thu, 04 Sep 2014 09:27:46 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-6967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14121518#comment-14121518
 ]


Daryn Sharp commented on HDFS-6967:
-----------------------------------

This is becoming a critical issue.  Wide and/or many jobs overwhelm DNs (due to 
smaller heap) and sometimes the NN.  Either can OOM or go into full GC.  In the 
case of DNs, it can GC so much that the NN declares it dead.

Hadoop's jetty version does not support limiting of connections, in fact the 
code contains "README"s that connections should be limited down in the bowels 
of private/anonymous classes.  It would require some serious risky hacks to 
override jetty's behavior.

I'd like to solicit comments on an internal approach we are trying:  rather 
than redirect clients to a DN with a replica, redirect to a random DN in a rack 
with a replica.  This distributes the jetty connections, DFSClients, and 
associated objects over many more nodes with an insignificant impact to 
throughput - esp. over higher latency links and compared to the time it takes 
for the client to timeout the connection and retry another node which may also 
be overloaded.

> DNs may OOM under high webhdfs load
> -----------------------------------
>
>                 Key: HDFS-6967
>                 URL: https://issues.apache.org/jira/browse/HDFS-6967
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode, webhdfs
>    Affects Versions: 2.0.0-alpha, 3.0.0
>            Reporter: Daryn Sharp
>            Assignee: Eric Payne
>
> Webhdfs uses jetty.  The size of the request thread pool is limited, but 
> jetty will accept and queue infinite connections.  Every queued connection is 
> "heavy" with buffers, etc.  Unlike data streamer connections, thousands of 
> webhdfs connections will quickly OOM a DN.  The accepted requests must be 
> bounded and excess clients rejected so they retry on a new DN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6967) DNs may OOM under high webhdfs load

Reply via email to