[
https://issues.apache.org/jira/browse/HDFS-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12977781#action_12977781
]
jinglong.liujl commented on HDFS-1541:
--------------------------------------
I think ,this way will hide the root cause. Because, in large cluster, even we
live out safemode, We'll meet the same issue which cause by concurrent block
report.
The root cause is currently heartbeat is too heavy waight, too many things have
been done in it, such as block report / block receive/ task assign. To avoid
one of them blocked other datanode heartbeat,we can use another rpc
(lightweight heartbeat) to keep alive. In this heartbeat, it only update time
stamp to avoid lost datanode and should not require FSNamesystem lock.
> Not marking datanodes dead When namenode in safemode
> ----------------------------------------------------
>
> Key: HDFS-1541
> URL: https://issues.apache.org/jira/browse/HDFS-1541
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: name-node
> Affects Versions: 0.23.0
> Reporter: Hairong Kuang
> Assignee: Hairong Kuang
> Fix For: 0.23.0
>
> Attachments: deadnodescheck.patch
>
>
> In a big cluster, when namenode starts up, it takes a long time for namenode
> to process block reports from all datanodes. Because heartbeats processing
> get delayed, some datanodes are erroneously marked as dead, then later on
> they have to register again, thus wasting time.
> It would speed up starting time if the checking of dead nodes is disabled
> when namenode in safemode.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.