[ 
https://issues.apache.org/jira/browse/HDFS-5788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13880062#comment-13880062
 ] 

Daryn Sharp commented on HDFS-5788:
-----------------------------------

For a bit more context, we had about ~6-7k tasks (erroneously) issuing 
listLocatedStatus.  Each limited response was over 1M.  The handler attempts a 
non-blocking write for the response.  If the entire response cannot be written, 
the call is added to the background responder thread.  The kernel accepts well 
below 1M for a non-blocking write so all the responses were added to the 
responder thread.

The call response byte buffers track the position of the last write, thus the 
entire response buffer is retained until the full response is sent.  
Re-allocating a buffer with the unsent response will likely introduce 
additional memory pressure, so the most logical/simplistic change is limiting 
the response size of the located status.

The end result in our case was the heap bloating by over 8G.  Full GC kicked 
in.  The NN was unresponsive for up to 5m at a time.  Each time it woke up it 
marked DNs as dead, causing a flurry of replications which further aggravated 
the memory issue.  Due to other exposed bugs, the NN required a restart.

Although more RPCs are required to satisfy the large requests, I believe the 
tradeoff is reasonable.  It's also not likely to be a common occurrence.

> listLocatedStatus response can be very large
> --------------------------------------------
>
>                 Key: HDFS-5788
>                 URL: https://issues.apache.org/jira/browse/HDFS-5788
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>    Affects Versions: 3.0.0, 0.23.10, 2.2.0
>            Reporter: Nathan Roberts
>            Assignee: Nathan Roberts
>         Attachments: HDFS-5788.patch
>
>
> Currently we limit the size of listStatus requests to a default of 1000 
> entries. This works fine except in the case of listLocatedStatus where the 
> location information can be quite large. As an example, a directory with 7000 
> entries, 4 blocks each, 3 way replication - a listLocatedStatus response is 
> over 1MB. This can chew up very large amounts of memory in the NN if lots of 
> clients try to do this simultaneously.
> Seems like it would be better if we also considered the amount of location 
> information being returned when deciding how many files to return.
> Patch will follow shortly.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to