[ 
https://issues.apache.org/jira/browse/HADOOP-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12525519
 ] 

Raghu Angadi commented on HADOOP-1849:
--------------------------------------

Server log for HADOOP-1763 would have been very useful for this. As far as I 
remember Dhruba looked for "dropping because max q reached" messages for 
scalability improvements on Namenode. When these messages went away that was a 
good indicator of improvement. With a large cluster this is pretty easy to test.

Yes, memory should also be a concern, though increasing handler also has the 
same memory increase plus memory for for each of the threads (may be 512k 
virtual memory for each thread). I datanode blockReports is one example where 
each RPC take a lot of memory.

> IPC server max queue size should be configurable
> ------------------------------------------------
>
>                 Key: HADOOP-1849
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1849
>             Project: Hadoop
>          Issue Type: Improvement
>            Reporter: Raghu Angadi
>             Fix For: 0.15.0
>
>
> Currently max queue size for IPC server is set to (100 * handlers). Usually 
> when RPC failures are observed (e.g. HADOOP-1763), we increase number of 
> handlers and the problem goes away. I think a big part of such a fix is 
> increase in max queue size. I think we should make maxQsize per handler 
> configurable (with a bigger default than 100). There are other improvements 
> also (HADOOP-1841).
> Server keeps reading RPC requests from clients. When the number in-flight 
> RPCs is larger than maxQsize, the earliest RPCs are deleted. This is the main 
> feedback Server has for the client. I have often heard from users that Hadoop 
> doesn't handle bursty traffic.
> Say handler count is 10 (default) and Server can handle 1000 RPCs a sec 
> (quite conservative/low for a typical server), it implies that an RPC can 
> wait for only for 1 sec before it is dropped. If there 3000 clients and all 
> of them send RPCs around the same time (not very rare, with heartbeats etc), 
> 2000 will be dropped. In stead of dropping the earliest RPCs, if the server 
> delays reading new RPCs, the feedback to clients would be much smoother, I 
> will file another jira regd queue management.
> For this jira I propose to make queue size per handler configurable, with a 
> larger default (may be 500).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to