[
https://issues.apache.org/jira/browse/HADOOP-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12481644
]
Raghu Angadi commented on HADOOP-1079:
--------------------------------------
> Actually I don't know what happens in such a case now. What ever datanode has
> is the master copy.
> Not sure what happens to blocks added after the report is sent but before it
> is processed.
This does lead to real problems see HADOOP-1093.
> DFS Scalability: optimize processing time of block reports
> ----------------------------------------------------------
>
> Key: HADOOP-1079
> URL: https://issues.apache.org/jira/browse/HADOOP-1079
> Project: Hadoop
> Issue Type: Bug
> Components: dfs
> Reporter: dhruba borthakur
>
> I have a cluster that has 1800 datanodes. Each datanode has around 50000
> blocks and sends a block report to the namenode once every hour. This means
> that the namenode processes a block report once every 2 seconds. Each block
> report contains all blocks that the datanode currently hosts. This makes the
> namenode compare a huge number of blocks that practically remains the same
> between two consecutive reports. This wastes CPU on the namenode.
> The problem becomes worse when the number of datanodes increases.
> One proposal is to make succeeding block reports (after a successful send of
> a full block report) be incremental. This will make the namenode process only
> those blocks that were added/deleted in the last period.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.