[ 
https://issues.apache.org/jira/browse/HDFS-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12977781#action_12977781
 ] 

jinglong.liujl commented on HDFS-1541:
--------------------------------------

I think ,this way will hide the root cause. Because, in large cluster, even we 
live out safemode, We'll meet the same issue which cause by concurrent block 
report. 
The root cause is currently heartbeat is too heavy waight, too many things have 
been done in it, such as block report / block receive/ task assign. To avoid 
one of them blocked other datanode heartbeat,we can use another rpc 
(lightweight heartbeat) to keep alive. In this heartbeat, it only update time 
stamp to avoid lost datanode and should not require FSNamesystem lock.

> Not marking datanodes dead When namenode in safemode
> ----------------------------------------------------
>
>                 Key: HDFS-1541
>                 URL: https://issues.apache.org/jira/browse/HDFS-1541
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>    Affects Versions: 0.23.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.23.0
>
>         Attachments: deadnodescheck.patch
>
>
> In a big cluster, when namenode starts up,  it takes a long time for namenode 
> to process block reports from all datanodes. Because heartbeats processing 
> get delayed, some datanodes are erroneously marked as dead, then later on 
> they have to register again, thus wasting time.
> It would speed up starting time if the checking of dead nodes is disabled 
> when namenode in safemode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to