[ 
https://issues.apache.org/jira/browse/HDFS-14186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16843106#comment-16843106
 ] 

He Xiaoqiao commented on HDFS-14186:
------------------------------------

hi [~kihwal], Thanks for your valuable comments.
{quote}This is different from waiting for all replica locations for all blocks, 
which will not happen if any node is down. A similar concept is there as 
"staleness" check after a HA transition. 
{quote}
it is truth, and we could use another count number represent total replicas to 
check if it gets to threshold (maybe it should be another percent such as 
99.9%). I agree that it is not necessary to wait for all replicas to report 
then leave safe mode.
{quote}Standby safe mode can optionally get the list of nodes from the active, 
if running, and do this check to determine whether it is truly ready to take 
over.{quote}
Maybe it could not the best choice and actually I has considered get 
information from active NN and check condition to determine if it is ready to 
leave safemode. Since it is possible that there is no ANN running such as 
restart the whole cluster or meet ANN exception when start SBN unfortunately.
I think we have enough information from FsImage plus replay EditLog to 
calculate expect total replicas count number. of course the actual replicas may 
be less or more than expected (such as some replicas are pending reconstruction 
or some replicas are pending delete, however this number could be very small 
normally). So I think it is feasible if we only check the threshold of all 
replicas as demo patch [^HDFS-14186.001.patch] shows. Thanks again.

> blockreport storm slow down namenode restart seriously in large cluster
> -----------------------------------------------------------------------
>
>                 Key: HDFS-14186
>                 URL: https://issues.apache.org/jira/browse/HDFS-14186
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>    Affects Versions: 2.7.1
>            Reporter: He Xiaoqiao
>            Assignee: He Xiaoqiao
>            Priority: Major
>         Attachments: HDFS-14186.001.patch
>
>
> In the current implementation, the datanode sends blockreport immediately 
> after register to namenode successfully when restart, and the blockreport 
> storm will make namenode high load to process them. One result is some 
> received RPC have to skip because queue time is timeout. If some datanodes' 
> heartbeat RPC are continually skipped for long times (default is 
> heartbeatExpireInterval=630s) it will be set DEAD, then datanode has to 
> re-register and send blockreport again, aggravate blockreport storm and trap 
> in a vicious circle, and slow down (more than one hour and even more) 
> namenode startup seriously in a large (several thousands of datanodes) and 
> busy cluster especially. Although there are many work to optimize namenode 
> startup, the issue still exists. 
> I propose to postpone dead datanode check when namenode have finished startup.
> Any comments and suggestions are welcome.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to