Improve namenode restart times by short-circuiting the first block reports from
datanodes
-----------------------------------------------------------------------------------------
Key: HDFS-1295
URL: https://issues.apache.org/jira/browse/HDFS-1295
Project: Hadoop HDFS
Issue Type: Improvement
Components: name-node
Reporter: dhruba borthakur
Assignee: dhruba borthakur
Fix For: 0.22.0
The namenode restart is dominated by the performance of processing block
reports. On a 2000 node cluster with 90 million blocks, block report
processing takes 30 to 40 minutes. The namenode "diffs" the contents of the
incoming block report with the contents of the blocks map, and then applies
these diffs to the blocksMap, but in reality there is no need to compute the
"diff" because this is the first block report from the datanode.
This code change improves block report processing time by 300%.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.