[jira] [Commented] (HDFS-395) DFS Scalability: Incremental block reports

Tomasz Nykiel (JIRA) Fri, 19 Aug 2011 09:22:53 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087785#comment-13087785
 ]


Tomasz Nykiel commented on HDFS-395:
------------------------------------

@Nicholas

for building I used "ant clean compile" "ant clean test" "ant clean 
compile-hdfs-classes", always the same problem.
When I try to access for instance:
"https://repository.apache.org/content/repositories/snapshots/com/sun/jmx/jmxri/1.2.1/jmxri-1.2.1.jar";
 (which ant is trying to fetch) with wget, I get 404NotFound.

As for the ReceivedDeletedBlockInfo, I still think it's safer to preserve the 
order. For sure, it mattered in our FB implementation.
In fact separate structures were implemented first, and then collapsed to one 
structure for both types of acks, solely for preserving the order. In apache it 
could probably be resolved.

On the other hand, keeping a unified structure for both types, makes the dn 
code cleaner, (e.g., synchronization), and introduces minimal space overhead.


> DFS Scalability: Incremental block reports
> ------------------------------------------
>
>                 Key: HDFS-395
>                 URL: https://issues.apache.org/jira/browse/HDFS-395
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: data-node, name-node
>            Reporter: dhruba borthakur
>            Assignee: Tomasz Nykiel
>         Attachments: blockReportPeriod.patch, explicitAcks.patch-3, 
> explicitDeleteAcks.patch
>
>
> I have a cluster that has 1800 datanodes. Each datanode has around 50000 
> blocks and sends a block report to the namenode once every hour. This means 
> that the namenode processes a block report once every 2 seconds. Each block 
> report contains all blocks that the datanode currently hosts. This makes the 
> namenode compare a huge number of blocks that practically remains the same 
> between two consecutive reports. This wastes CPU on the namenode.
> The problem becomes worse when the number of datanodes increases.
> One proposal is to make succeeding block reports (after a successful send of 
> a full block report) be incremental. This will make the namenode process only 
> those blocks that were added/deleted in the last period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-395) DFS Scalability: Incremental block reports

Reply via email to