[ 
https://issues.apache.org/jira/browse/HBASE-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-1754:
----------------------------------

    Assignee: Andrew Purtell
      Status: Patch Available  (was: Open)

> indefinite hang in IPC under some circumstances
> -----------------------------------------------
>
>                 Key: HBASE-1754
>                 URL: https://issues.apache.org/jira/browse/HBASE-1754
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>         Attachments: HBASE-1754.patch
>
>
> If a regionserver crashes while the client is engaged in IPC with it at a 
> vulnerable point in the TCP FSM (ESTABLISHED, no outstanding data to send), 
> the IPC will be stuck waiting "forever" (> 12 hours, etc.). This hoses the 
> client, especially if it is trying to look up a region in META. Worse, it is 
> not possible to restart the regionserver if the hung client is colocated with 
> it on the same host, because the OS will consider port 60020 bound and in 
> use, unless the client is forcibly killed. Killing some types of applications 
> -- especially long running processes which can't redo work from a checkpoint 
> but must start over from the beginning -- can be very painful. Investigate if 
> TCP keepalives can be enabled at the IPC level. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to