[
https://issues.apache.org/jira/browse/HADOOP-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12587489#action_12587489
]
Raghu Angadi commented on HADOOP-2910:
--------------------------------------
Ignoring Windows problem, proposal for this jira and for near future
improvements:
For this jira :
# callQueue3.patch
# Increase default backlog to a large value. There is no advantage to a smaller
value.
# Make the client connect time out short (15 sec?). Client retries many times
(may be for couple of hours) in case of timeout. A short timeout is better
rather than waiting for 189 sec TCP timeout mentioned above since short timeout
will cap the connection latency to 15 secs when the server is temporarily busy.
# make Server.doAccept() accept more than one connection at a time. Something
like 10 or 100 each time (not 100% required).
Useful changes, may be in near future :
# Limit the total number of accepted connections, so that Server does not run
out of file descriptors. This limit could be something like 90% of fd limit of
the process (if we get hold of such a limit).
# Make the max queue size proportional to clients rather than handlers. If
server does not read the RPC, then kernel ends up paying for memory instead of
the process. Even with 100k requests, it probably does not take much more than
100MB. I suggest something like 20-30 k.
# With a larger queue size, there could be worst case that takes lot of memory
(e.g. block reports from 4k datanodes).. So Server could have memory limit as
well.
#- This memory calculation need not be very accurate. This could be
proportional to size of the RPC request on the wire (say 1.5 times bytes read
from socket for an RPC, plus length of write data).
> Throttle IPC Client/Server during bursts of requests or server slowdown
> -----------------------------------------------------------------------
>
> Key: HADOOP-2910
> URL: https://issues.apache.org/jira/browse/HADOOP-2910
> Project: Hadoop Core
> Issue Type: Improvement
> Components: ipc
> Affects Versions: 0.16.0
> Reporter: Hairong Kuang
> Assignee: Hairong Kuang
> Fix For: 0.18.0
>
> Attachments: callQueue.patch, callQueue1.patch, callQueue2.patch,
> callQueue3.patch, TestBacklog.java, TestBacklogWithPool.java,
> throttleClient.patch
>
>
> I propose the following to avoid an IPC server being swarmed by too many
> requests and connections
> 1. Limit call queue length or limit the amount of memory used in the call
> queue. This can be done by including the size of a request in the header and
> storing unmarshaled requests in the call queue.
> 2. If the call queue is full or queue buffer is full, stop reading requests
> from sockets. So requests stay at the server's system buffer or at the client
> side and thus eventually throttle the client.
> 3. Limit the total number of connections. Do not accept new connections if
> the connection limit is exceeded. (Note: this solution is unfair to new
> connections.)
> 4. If receive out of memory exception, close the current connection.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.