[
https://issues.apache.org/jira/browse/HDFS-577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749562#action_12749562
]
Allen Wittenauer commented on HDFS-577:
---------------------------------------
Just to put what I suspect is the basic concern to rest :), I don't want a
double check on every block report/heartbeat. But I think it might be useful
if the name node attempted to connect to the data node over a long period of
time [probably another configurable :( ].
I'm trying to think of a use case where it would be beneficial/useful if data
node/name node had one way communication and coming up empty.
As to the network partitioning problem (where data nodes lose connectivity to
each other, but name node still has connectivity), it may be worth while to
have an algorithm such that if x% percent cannot communicate, then we enter
safe mode. From a practical perspective, chances are good the job tracker is
going to go down in flames in those sorts of situations anyway since the
tasktrackers should end up on the dead pile. Even in a pure HDFS setup, at
some point the replication list is going to get very large if we start
declaring nodes dead based upon %... so probably better off to just safemode
ourselves and alert the admin that the network is horked.
> Name node doesn't always properly recognize health of data node
> ---------------------------------------------------------------
>
> Key: HDFS-577
> URL: https://issues.apache.org/jira/browse/HDFS-577
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Allen Wittenauer
>
> The one-way communication (data node -> name node) for node health does not
> guarantee that the data node is actually healthy.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.