[ 
https://issues.apache.org/jira/browse/HDFS-5016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13720279#comment-13720279
 ] 

Suresh Srinivas commented on HDFS-5016:
---------------------------------------

bq. I agree with Todd, this looks like the same deadlock (and basically the 
same fix) as what we have at HDFS-4851.
This patch is slightly different in that it adds timeout for writer thread as 
well. I prefer to get this in (I am going to post a patch in a couple of 
minutes) with timeout configurable, as soon as possible, given this is marked 
as release blocker (and rightfully so).

Lets either close HDFS-4851 as duplicate or if you want some of the changes 
from that, do it as part of HDFS-4851.
                
> Heartbeating thread blocks under some failure conditions leading to loss of 
> datanodes
> -------------------------------------------------------------------------------------
>
>                 Key: HDFS-5016
>                 URL: https://issues.apache.org/jira/browse/HDFS-5016
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Devaraj Das
>            Assignee: Suresh Srinivas
>            Priority: Blocker
>             Fix For: 2.1.0-beta
>
>         Attachments: HDFS-5016.1.patch, HDFS-5016.patch, jstack1.txt
>
>
> In the testing of some failure scenarios for HBase MTTR, we have been 
> simulating node failures via firewalling of nodes (where all communication 
> ports would be firewalled except ssh's port). We have noticed that when a 
> (data)node is firewalled, we lose certain other datanodes - those that were 
> involved in some communication with the firewalled node before the latter was 
> firewalled. Will attach jstack output from one of the lost datanodes. The 
> heartbeating thread seems to be locked up.
> This jira is to track a fix for the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to