[ http://issues.apache.org/jira/browse/HADOOP-620?page=comments#action_12457853 ] Raghu Angadi commented on HADOOP-620: -------------------------------------
Dhruba submitted a patch that calculates total capacity, totalDiskRemaining etc when required by iterating over all the nodes instead of maintaining global counter (HADOOP-814). Replication factor can be approximated in the same way : replication = (sum all blocks in each node) / (size of global blockMap). will add this once 814 is committed. > replication factor should be calucalated based on actual dfs block sizes at > the NameNode. > ----------------------------------------------------------------------------------------- > > Key: HADOOP-620 > URL: http://issues.apache.org/jira/browse/HADOOP-620 > Project: Hadoop > Issue Type: Bug > Components: dfs > Reporter: Raghu Angadi > Assigned To: Raghu Angadi > Priority: Minor > > Currently 'dfs -report' calculates replication facto like the following : > (totalCapacity - totalDiskRemaining) / (totalSize of dfs files in Name > space). > Problem with this is that this includes disk space used by non-dfs files > (e.g. map reduce jobs) on data node. On my single node test, I get > replication factor of 100 since I have a 1 GB dfs file with out replication > and there is 99GB of unrelated data on the same volume. > ideally name should calculate it with : (total size of all the blocks known > to it) / (total size of files in Name space). > Initial proposal to keep 'total size of all the blocks' update is to track it > in datanode descriptor and update it when namenode receives block reports > from the datanode ( and subtract when the datanode is removed). -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira