[jira] Commented: (HADOOP-620) replication factor should be calucalated based on actual dfs block sizes at the NameNode.

Raghu Angadi (JIRA) Tue, 12 Dec 2006 12:14:47 -0800

    [ 
http://issues.apache.org/jira/browse/HADOOP-620?page=comments#action_12457853 ] 
            
Raghu Angadi commented on HADOOP-620:
-------------------------------------



Dhruba submitted a patch that calculates total capacity, totalDiskRemaining etc 
when required by iterating over all the nodes instead of maintaining global 
counter (HADOOP-814). Replication factor can be approximated in the same way : 

replication = (sum all blocks in each node) / (size of global blockMap).

will add this once 814 is committed.
 

> replication factor should be calucalated based on actual dfs block sizes at 
> the NameNode.
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-620
>                 URL: http://issues.apache.org/jira/browse/HADOOP-620
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Raghu Angadi
>         Assigned To: Raghu Angadi
>            Priority: Minor
>
> Currently 'dfs -report' calculates replication facto like the following :
>      (totalCapacity - totalDiskRemaining) / (totalSize of dfs files in Name 
> space).
> Problem with this is that this includes disk space used by non-dfs files 
> (e.g. map reduce jobs) on data node. On my single node test, I get 
> replication factor of 100 since I have a 1 GB dfs file with out replication 
> and there is 99GB of unrelated data on the same volume.
> ideally name should calculate it with : (total size of all the blocks known 
> to it) / (total size of files in Name space).
> Initial proposal to keep 'total size of all the blocks' update is to track it 
> in datanode descriptor and update it when namenode receives block reports 
> from the datanode ( and subtract when the datanode is removed).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-620) replication factor should be calucalated based on actual dfs block sizes at the NameNode.

Reply via email to