[ 
https://issues.apache.org/jira/browse/HDFS-577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749562#action_12749562
 ] 

Allen Wittenauer commented on HDFS-577:
---------------------------------------

Just to put what I suspect is the basic concern to rest :), I don't want a 
double check on every block report/heartbeat.   But I think it might be useful 
if the name node attempted to connect to the data node over a long period of 
time [probably another configurable :( ].  

I'm trying to think of a use case where it would be beneficial/useful if data 
node/name node had one way communication and coming up empty.

As to the network partitioning problem (where data nodes lose connectivity to 
each other, but name node still has connectivity), it may be worth while to 
have an algorithm such that if x% percent cannot communicate, then we enter 
safe mode.  From a practical perspective, chances are good the job tracker is 
going to go down in flames in those sorts of situations anyway since the 
tasktrackers should end up on the dead pile.  Even in a pure HDFS setup, at 
some point the replication list is going to get very large if we start 
declaring nodes dead based upon %... so probably better off to just safemode 
ourselves and alert the admin that the network is horked.

> Name node doesn't always properly recognize health of data node
> ---------------------------------------------------------------
>
>                 Key: HDFS-577
>                 URL: https://issues.apache.org/jira/browse/HDFS-577
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Allen Wittenauer
>
> The one-way communication (data node -> name node) for node health does not 
> guarantee that the data node is actually healthy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to