indefinite hang in IPC under some circumstances
-----------------------------------------------
Key: HBASE-1754
URL: https://issues.apache.org/jira/browse/HBASE-1754
Project: Hadoop HBase
Issue Type: Bug
Reporter: Andrew Purtell
If a regionserver crashes while the client is engaged in IPC with it at a
vulnerable point in the TCP FSM (ESTABLISHED, no outstanding data to send), the
IPC will be stuck waiting forever until the regionserver is restarted and at
the TCP level the connection will be reset. However, it is not possible to
restart the regionserver if the client is colocated with it on the same host,
because the OS will consider port 60020 bound and in use, unless the client is
forcibly killed. Killing some types of applications -- especially long running
processes which can't redo work from a checkpoint but must start over from the
beginning -- can be very painful. Investigate if TCP keepalives can be enabled
at the IPC level.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.