[jira] Commented: (HADOOP-1079) DFS Scalability: optimize processing time of block reports

Raghu Angadi (JIRA) Thu, 08 Mar 2007 11:20:46 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12479412
 ]


Raghu Angadi commented on HADOOP-1079:
--------------------------------------


With deltas, how do we handle the case where a namenode request to remove a 
block  is lost? Two sides can't be in sync with only one side maintaining 
diffs... I think.

> In my test case, there were no two succeeding hourly block reports that were 
> identical. 

This is normal. Did most of the reports result in deletion or addition of 
blocks during the report? 

> I had 1800 data nodes and was running randomWriter. In this scenario, using a 
> hash to identify identical block reports might not buy us anything. 

Hash is not to identify if block report has changed. Both sides have up to date 
hash.. if hash is same then namenode and datanode have the same set of blocks. 
This has no relation to prev block report. There might be some fixes needed to 
make sure both sides see same set during a block report.



> DFS Scalability: optimize processing time of block reports
> ----------------------------------------------------------
>
>                 Key: HADOOP-1079
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1079
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: dhruba borthakur
>
> I have a cluster that has 1800 datanodes. Each datanode has around 50000 
> blocks and sends a block report to the namenode once every hour. This means 
> that the namenode processes a block report once every 2 seconds. Each block 
> report contains all blocks that the datanode currently hosts. This makes the 
> namenode compare a huge number of blocks that practically remains the same 
> between two consecutive reports. This wastes CPU on the namenode.
> The problem becomes worse when the number of datanodes increases.
> One proposal is to make succeeding block reports (after a successful send of 
> a full block report) be incremental. This will make the namenode process only 
> those blocks that were added/deleted in the last period.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1079) DFS Scalability: optimize processing time of block reports

Reply via email to