[ https://issues.apache.org/jira/browse/HADOOP-620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12486537 ]
Raghu Angadi commented on HADOOP-620: ------------------------------------- Test. Please ignore: <pre> ---- | | ---- </pre> > replication factor should be calucalated based on actual dfs block sizes at > the NameNode. > ----------------------------------------------------------------------------------------- > > Key: HADOOP-620 > URL: https://issues.apache.org/jira/browse/HADOOP-620 > Project: Hadoop > Issue Type: Bug > Components: dfs > Reporter: Raghu Angadi > Assigned To: Raghu Angadi > Priority: Minor > > Currently 'dfs -report' calculates replication facto like the following : > (totalCapacity - totalDiskRemaining) / (totalSize of dfs files in Name > space). > Problem with this is that this includes disk space used by non-dfs files > (e.g. map reduce jobs) on data node. On my single node test, I get > replication factor of 100 since I have a 1 GB dfs file with out replication > and there is 99GB of unrelated data on the same volume. > ideally name should calculate it with : (total size of all the blocks known > to it) / (total size of files in Name space). > Initial proposal to keep 'total size of all the blocks' update is to track it > in datanode descriptor and update it when namenode receives block reports > from the datanode ( and subtract when the datanode is removed). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.