[ 
https://issues.apache.org/jira/browse/HDFS-14576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16865911#comment-16865911
 ] 

Stephen O'Donnell commented on HDFS-14576:
------------------------------------------

As [~jojochuang] mentioned, we used to see a lot of issues like this, but in 
later CDH versions several patches have been backported that made the initial 
block report problem largely disappear.  Unfortunately I don't have the list of 
Jiras and their relative impact.

Have you investigated using dfs.blockreport.initialDelay for the datanodes? I 
believe that will cause the datanode to delay its initial block report by a 
random interval between zero and that setting. If you know your average startup 
time for the cluster, perhaps you could set that value to something close to 
the average startup time and then hopefully the DNs would send their initial 
block reports over that interval rather than all at once, spreading the load 
more evenly.

For info, what version of HDFS are you running where you see these problems?

> Avoid block report retry and slow down namenode startup
> -------------------------------------------------------
>
>                 Key: HDFS-14576
>                 URL: https://issues.apache.org/jira/browse/HDFS-14576
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>            Reporter: He Xiaoqiao
>            Assignee: He Xiaoqiao
>            Priority: Major
>
> During namenode startup, the load will be very high since it has to process 
> every datanodes blockreport one by one. If there are hundreds datanodes block 
> reports pending process, the issue will be more serious even 
> #processFirstBlockReport is processed a lot more efficiently than ordinary 
> block reports. Then some of datanode will retry blockreport and lengthens 
> restart times. I think we should filter the block report request (via 
> datanode blockreport retries) which has be processed and return directly then 
> shorten down restart time. I want to state this proposal may be obvious only 
> for large cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to