Improve the Scalability and Robustness of IPC
---------------------------------------------

                 Key: HADOOP-2864
                 URL: https://issues.apache.org/jira/browse/HADOOP-2864
             Project: Hadoop Core
          Issue Type: Improvement
          Components: ipc
    Affects Versions: 0.16.0
            Reporter: Hairong Kuang
            Assignee: Hairong Kuang
             Fix For: 0.17.0


This jira is intended to enhance IPC's scalability and robustness. 

Currently an IPC server can easily hung due to a disk failure or garbage 
collection, during which it cannot respond to the clients promptly. This has 
caused a lot of dropped calls and delayed responses thus many running 
applications fail on timeout. On the other side if busy clients send a lot of 
requests to the server in a short period of time or too many clients 
communicate with the server simultaneously, the server may be swarmed by 
requests and cannot work responsively. 

The proposed changes aim to 
# provide a better client/server coordination
#* Server should be able to throttle client during burst of requests.
#* A slow client should not affect server from serving other clients.
#* A temporary hanging server should not cause catastrophic failures to clients.
# Client/server should detect remote side failures. Examples of failures 
include: (1) the remote host is crashed; (2) the remote host is crashed and 
then rebooted; (3) the remote process is crashed or shut down by an operator;
# Fairness. Each client should be able to make progress.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to