[
https://issues.apache.org/jira/browse/HADOOP-2864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hairong Kuang updated HADOOP-2864:
----------------------------------
Attachment: RPCScalabilityDesignWeb.pdf
Design document is attached.
> Improve the Scalability and Robustness of IPC
> ---------------------------------------------
>
> Key: HADOOP-2864
> URL: https://issues.apache.org/jira/browse/HADOOP-2864
> Project: Hadoop Core
> Issue Type: Improvement
> Components: ipc
> Affects Versions: 0.16.0
> Reporter: Hairong Kuang
> Assignee: Hairong Kuang
> Fix For: 0.17.0
>
> Attachments: RPCScalabilityDesignWeb.pdf
>
>
> This jira is intended to enhance IPC's scalability and robustness.
> Currently an IPC server can easily hung due to a disk failure or garbage
> collection, during which it cannot respond to the clients promptly. This has
> caused a lot of dropped calls and delayed responses thus many running
> applications fail on timeout. On the other side if busy clients send a lot of
> requests to the server in a short period of time or too many clients
> communicate with the server simultaneously, the server may be swarmed by
> requests and cannot work responsively.
> The proposed changes aim to
> # provide a better client/server coordination
> #* Server should be able to throttle client during burst of requests.
> #* A slow client should not affect server from serving other clients.
> #* A temporary hanging server should not cause catastrophic failures to
> clients.
> # Client/server should detect remote side failures. Examples of failures
> include: (1) the remote host is crashed; (2) the remote host is crashed and
> then rebooted; (3) the remote process is crashed or shut down by an operator;
> # Fairness. Each client should be able to make progress.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.