[ 
https://issues.apache.org/jira/browse/HBASE-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-1754:
----------------------------------

         Priority: Minor  (was: Major)
    Fix Version/s: 0.21.0
                   0.20.0
          Summary: use TCP keepalives  (was: indefinite hang in IPC under some 
circumstances)

> use TCP keepalives
> ------------------
>
>                 Key: HBASE-1754
>                 URL: https://issues.apache.org/jira/browse/HBASE-1754
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>            Priority: Minor
>             Fix For: 0.20.0, 0.21.0
>
>         Attachments: HBASE-1754.patch
>
>
> If a regionserver crashes while the client is engaged in IPC with it at a 
> vulnerable point in the TCP FSM (ESTABLISHED, no outstanding data to send), 
> the IPC will be stuck waiting "forever" (> 12 hours, etc.). This hoses the 
> client, especially if it is trying to look up a region in META. Worse, it is 
> not possible to restart the regionserver if the hung client is colocated with 
> it on the same host, because the OS will consider port 60020 bound and in 
> use, unless the client is forcibly killed. Killing some types of applications 
> -- especially long running processes which can't redo work from a checkpoint 
> but must start over from the beginning -- can be very painful. Investigate if 
> TCP keepalives can be enabled at the IPC level. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to