Kihwal Lee created HADOOP-9229:
----------------------------------
Summary: IPC: Retry on connection reset or socket timeout during
SASL negotiation
Key: HADOOP-9229
URL: https://issues.apache.org/jira/browse/HADOOP-9229
Project: Hadoop Common
Issue Type: Improvement
Components: ipc
Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.7
Reporter: Kihwal Lee
When an RPC server is overloaded, incoming connections may not get accepted in
time, causing listen queue overflow. The impact on client varies depending on
the type of OS in use. On Linux, connections in this state look fully connected
to the clients, but they are without buffers, thus any data sent to the server
will get dropped.
This won't be a problem for protocols where client first wait for server's
greeting. Even for clients-speak-first protocols, it will be fine if the
overload is transient and such connections are accepted before the
retransmission of dropped packets arrive. Otherwise, clients can hit socket
timeout after several retransmissions. In certain situations, connection will
get reset while clients still waiting for ack.
We have seen this happening to IPC clients during SASL negotiation. Since no
call has been sent, we should allow retry when connection reset or socket
timeout happens in this stage.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira