[jira] [Commented] (HDFS-6967) DNs may OOM under high webhdfs load

Daryn Sharp (JIRA) Fri, 05 Sep 2014 06:45:55 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-6967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14122937#comment-14122937
 ]


Daryn Sharp commented on HDFS-6967:
-----------------------------------

Popular in the sense that wide jobs needed access to ~3 replicas, yes.  Jobs 
with 2-5k mappers that ran fine with hdfs cause significant problems with 
webhdfs.  We saw up to 20k jetty requests queued up while the DN was in the 
throes of GC.  Presumably the client is timing out and retrying the connection 
again, which further exacerbates the memory problem as the jetty queue is full 
of heavy objects for dead connections.

We could take the existing DN load into account, such that we pick the lightest 
xceiver loaded DN before picking a random rack local.  However the risk is that 
an onslaught of tasks will be distributed to the DNs with the replica before 
the DN can report their spikes in load.  This is already a minor issue for 
hdfs.  In the past we've considered using a "predictive" load where the NN 
artificially increases the DN's load stats as it gives out locations.  In the 
end, we chose to keep it simple with random rack-local.

> DNs may OOM under high webhdfs load
> -----------------------------------
>
>                 Key: HDFS-6967
>                 URL: https://issues.apache.org/jira/browse/HDFS-6967
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode, webhdfs
>    Affects Versions: 2.0.0-alpha, 3.0.0
>            Reporter: Daryn Sharp
>            Assignee: Eric Payne
>
> Webhdfs uses jetty.  The size of the request thread pool is limited, but 
> jetty will accept and queue infinite connections.  Every queued connection is 
> "heavy" with buffers, etc.  Unlike data streamer connections, thousands of 
> webhdfs connections will quickly OOM a DN.  The accepted requests must be 
> bounded and excess clients rejected so they retry on a new DN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6967) DNs may OOM under high webhdfs load

Reply via email to