[jira] [Created] (HDFS-14186) blockreport storm slow down namenode restart seriously in large cluster

He Xiaoqiao (JIRA) Sat, 05 Jan 2019 04:31:45 -0800

He Xiaoqiao created HDFS-14186:
----------------------------------

             Summary: blockreport storm slow down namenode restart seriously in 
large cluster
                 Key: HDFS-14186
                 URL: https://issues.apache.org/jira/browse/HDFS-14186
             Project: Hadoop HDFS
          Issue Type: Improvement
          Components: namenode
            Reporter: He Xiaoqiao
            Assignee: He Xiaoqiao



In the current implementation, the datanode sends blockreport immediately after 
register to namenode successfully when restart, and the blockreport storm will 
make namenode high load to process them. One result is some received RPC have 
to skip because queue time is timeout. If some datanodes' heartbeat RPC are 
continually skipped for long times (default is heartbeatExpireInterval=630s) it 
will be set DEAD, then datanode has to re-register and send blockreport again, 
aggravate blockreport storm and trap in a vicious circle, and slow down (more 
than one hour and even more) namenode startup seriously in a large (several 
thousands of datanodes) and busy cluster especially. Although there are many 
work to optimize namenode startup, the issue still exists. 
I propose to postpone dead datanode check when namenode have finished startup.
Any comments and suggestions are welcome.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (HDFS-14186) blockreport storm slow down namenode restart seriously in large cluster

Reply via email to