[ 
https://issues.apache.org/jira/browse/HDFS-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12887673#action_12887673
 ] 

dhruba borthakur commented on HDFS-1295:
----------------------------------------

> what happens with blocks in the first block report from a DN that do not 
> belong to any file

They are discarded as usual inside FSnamesystem.addStoredBlock(). This method 
checks if the block belongs to any inode, and if so, only then insert it into 
the blocksmap (this is existing code and is not modified by this patch).

> Improve namenode restart times by short-circuiting the first block reports 
> from datanodes
> -----------------------------------------------------------------------------------------
>
>                 Key: HDFS-1295
>                 URL: https://issues.apache.org/jira/browse/HDFS-1295
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>             Fix For: 0.22.0
>
>         Attachments: shortCircuitBlockReport_1.txt
>
>
> The namenode restart is dominated by the performance of processing block 
> reports. On a 2000 node cluster with 90 million blocks,  block report 
> processing takes 30 to 40 minutes. The namenode "diffs" the contents of the 
> incoming block report with the contents of the blocks map, and then applies 
> these diffs to the blocksMap, but in reality there is no need to compute the 
> "diff" because this is the first block report from the datanode.
> This code change improves block report processing time by 300%.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to