[jira] [Commented] (HDFS-395) DFS Scalability: Incremental block reports

Suresh Srinivas (JIRA) Fri, 15 Jul 2011 09:59:27 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066071#comment-13066071
 ]


Suresh Srinivas commented on HDFS-395:
--------------------------------------

> But for an excessive replica, it remains in the block map and 
> excessiveBlockMap until an ack is back. They are the ones that need explicit 
> acknowledgment. 

I know that for deleted files, when a previously deleted replica is reported by 
datanode to namenode, NN can again delete the replicas because the file does 
not exist. But I wonder why we do not remove excess replica also from the map 
on scheduling deletion.

However, this could come very handy in HA implementation. Currently all 
namespace operations goes to standby through editlog. However having the delete 
acks creates a channel to report block deletions also to standby. So I am +1 on 
delete acks from the perspective of HA.

Directory scanner should use the mechanism in this jira to send difference 
between in memory block map and the disk. This could be done in another jira.

> DFS Scalability: Incremental block reports
> ------------------------------------------
>
>                 Key: HDFS-395
>                 URL: https://issues.apache.org/jira/browse/HDFS-395
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: blockReportPeriod.patch, explicitDeleteAcks.patch
>
>
> I have a cluster that has 1800 datanodes. Each datanode has around 50000 
> blocks and sends a block report to the namenode once every hour. This means 
> that the namenode processes a block report once every 2 seconds. Each block 
> report contains all blocks that the datanode currently hosts. This makes the 
> namenode compare a huge number of blocks that practically remains the same 
> between two consecutive reports. This wastes CPU on the namenode.
> The problem becomes worse when the number of datanodes increases.
> One proposal is to make succeeding block reports (after a successful send of 
> a full block report) be incremental. This will make the namenode process only 
> those blocks that were added/deleted in the last period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-395) DFS Scalability: Incremental block reports

Reply via email to