[ 
https://issues.apache.org/jira/browse/HDFS-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888574#action_12888574
 ] 

Konstantin Shvachko commented on HDFS-1295:
-------------------------------------------

Not discarded, but rather ignored by {{addStoredBlock()}}. Therefore, with your 
patch the list of {{toInvalidate}} blocks will not be calculated and processed. 
And so deletion of replicas on data-nodes that don't belong to any file will be 
delayed until the next block report. 
My point is that there is trade-off, which you are not mentioning here, unless 
I missed something.
The trade-off is: _you will start faster, but space cleanup will be delayed._
And the only way to fix it that I can see, is to send the second block report 
right after the first one, which will double the load on NN during startup.

What is interesting though, the numbers you present show that construction of 
{{LinkedList}} in {{reportDiff()}} is time consuming, because the actual 
speedup is achieved because you reuse the same {{Block}} object, rather than 
creating them for each processed block as {{reportDiff()}} does.
So, may be if we address this, we can optimize overall block processing, 
including the startup time.

Btw, if it helps, you can use NNThroughputBenchmark to measure block report 
processing on a single node.

> Improve namenode restart times by short-circuiting the first block reports 
> from datanodes
> -----------------------------------------------------------------------------------------
>
>                 Key: HDFS-1295
>                 URL: https://issues.apache.org/jira/browse/HDFS-1295
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>             Fix For: 0.22.0
>
>         Attachments: shortCircuitBlockReport_1.txt
>
>
> The namenode restart is dominated by the performance of processing block 
> reports. On a 2000 node cluster with 90 million blocks,  block report 
> processing takes 30 to 40 minutes. The namenode "diffs" the contents of the 
> incoming block report with the contents of the blocks map, and then applies 
> these diffs to the blocksMap, but in reality there is no need to compute the 
> "diff" because this is the first block report from the datanode.
> This code change improves block report processing time by 300%.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to