[ 
https://issues.apache.org/jira/browse/HADOOP-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13871337#comment-13871337
 ] 

Daryn Sharp commented on HADOOP-10233:
--------------------------------------

Issue was seen when a errant job's 6k tasks issued listLocatedStatus on a 9k 
directory.  Each initial response for the 1k entries was ~1.2M.  The full 
response couldn't be sent immediately so the response queue held 6G+ of 
buffers.  The NN went repeatedly went into GC for up to 5m.

> RPC lacks output flow control
> -----------------------------
>
>                 Key: HADOOP-10233
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10233
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ipc
>    Affects Versions: 2.0.0-alpha, 3.0.0
>            Reporter: Daryn Sharp
>            Priority: Critical
>
> The RPC layer has input flow control via the callq, however it lacks any 
> output flow control.  A handler will try to directly send the response.  If 
> the full response is not sent then it is queued for the background responder 
> thread.  The RPC layer may end up queuing so many buffers that it "locks" up 
> in GC.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to