[jira] [Commented] (HDFS-395) DFS Scalability: Incremental block reports

Todd Lipcon (JIRA) Thu, 14 Jul 2011 13:59:26 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13065530#comment-13065530
 ]


Todd Lipcon commented on HDFS-395:
----------------------------------

bq. Todd, please correct me if I am wrong, but I don't see that this is 
happening in FSDataset.invalidate(). 

Ah, it seems we have three separate but similar AsyncDiskServices in Hadoop :) 
The MRAsyncDiskService class from MAPREDUCE-1302 uses a separate "toBeDeleted" 
directory as described above - I thought we used the same technique in the HDFS 
side, but it appears it's just a deferred delete of the file in the same 
location as you described.

Can we share some code here? There's already another AsyncDiskService 
superclass in common.

> DFS Scalability: Incremental block reports
> ------------------------------------------
>
>                 Key: HDFS-395
>                 URL: https://issues.apache.org/jira/browse/HDFS-395
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: blockReportPeriod.patch, explicitDeleteAcks.patch
>
>
> I have a cluster that has 1800 datanodes. Each datanode has around 50000 
> blocks and sends a block report to the namenode once every hour. This means 
> that the namenode processes a block report once every 2 seconds. Each block 
> report contains all blocks that the datanode currently hosts. This makes the 
> namenode compare a huge number of blocks that practically remains the same 
> between two consecutive reports. This wastes CPU on the namenode.
> The problem becomes worse when the number of datanodes increases.
> One proposal is to make succeeding block reports (after a successful send of 
> a full block report) be incremental. This will make the namenode process only 
> those blocks that were added/deleted in the last period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-395) DFS Scalability: Incremental block reports

Reply via email to