[
https://issues.apache.org/jira/browse/HBASE-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Purtell updated HBASE-1754:
----------------------------------
Assignee: Andrew Purtell
Status: Patch Available (was: Open)
> indefinite hang in IPC under some circumstances
> -----------------------------------------------
>
> Key: HBASE-1754
> URL: https://issues.apache.org/jira/browse/HBASE-1754
> Project: Hadoop HBase
> Issue Type: Bug
> Reporter: Andrew Purtell
> Assignee: Andrew Purtell
> Attachments: HBASE-1754.patch
>
>
> If a regionserver crashes while the client is engaged in IPC with it at a
> vulnerable point in the TCP FSM (ESTABLISHED, no outstanding data to send),
> the IPC will be stuck waiting "forever" (> 12 hours, etc.). This hoses the
> client, especially if it is trying to look up a region in META. Worse, it is
> not possible to restart the regionserver if the hung client is colocated with
> it on the same host, because the OS will consider port 60020 bound and in
> use, unless the client is forcibly killed. Killing some types of applications
> -- especially long running processes which can't redo work from a checkpoint
> but must start over from the beginning -- can be very painful. Investigate if
> TCP keepalives can be enabled at the IPC level.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.