Improve the Scalability and Robustness of IPC
---------------------------------------------
Key: HADOOP-2864
URL: https://issues.apache.org/jira/browse/HADOOP-2864
Project: Hadoop Core
Issue Type: Improvement
Components: ipc
Affects Versions: 0.16.0
Reporter: Hairong Kuang
Assignee: Hairong Kuang
Fix For: 0.17.0
This jira is intended to enhance IPC's scalability and robustness.
Currently an IPC server can easily hung due to a disk failure or garbage
collection, during which it cannot respond to the clients promptly. This has
caused a lot of dropped calls and delayed responses thus many running
applications fail on timeout. On the other side if busy clients send a lot of
requests to the server in a short period of time or too many clients
communicate with the server simultaneously, the server may be swarmed by
requests and cannot work responsively.
The proposed changes aim to
# provide a better client/server coordination
#* Server should be able to throttle client during burst of requests.
#* A slow client should not affect server from serving other clients.
#* A temporary hanging server should not cause catastrophic failures to clients.
# Client/server should detect remote side failures. Examples of failures
include: (1) the remote host is crashed; (2) the remote host is crashed and
then rebooted; (3) the remote process is crashed or shut down by an operator;
# Fairness. Each client should be able to make progress.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.