[ http://issues.apache.org/jira/browse/HADOOP-620?page=all ]

Sameer Paranjpye updated HADOOP-620:
------------------------------------

    Component/s: dfs
    Description: 
Currently 'dfs -report' calculates replication facto like the following :
     (totalCapacity - totalDiskRemaining) / (totalSize of dfs files in Name 
space).

Problem with this is that this includes disk space used by non-dfs files (e.g. 
map reduce jobs) on data node. On my single node test, I get replication factor 
of 100 since I have a 1 GB dfs file with out replication and there is 99GB of 
unrelated data on the same volume.

ideally name should calculate it with : (total size of all the blocks known to 
it) / (total size of files in Name space).

Initial proposal to keep 'total size of all the blocks' update is to track it 
in datanode descriptor and update it when namenode receives block reports from 
the datanode ( and subtract when the datanode is removed).




  was:

Currently 'dfs -report' calculates replication facto like the following :
     (totalCapacity - totalDiskRemaining) / (totalSize of dfs files in Name 
space).

Problem with this is that this includes disk space used by non-dfs files (e.g. 
map reduce jobs) on data node. On my single node test, I get replication factor 
of 100 since I have a 1 GB dfs file with out replication and there is 99GB of 
unrelated data on the same volume.

ideally name should calculate it with : (total size of all the blocks known to 
it) / (total size of files in Name space).

Initial proposal to keep 'total size of all the blocks' update is to track it 
in datanode descriptor and update it when namenode receives block reports from 
the datanode ( and subtract when the datanode is removed).





> replication factor should be calucalated based on actual dfs block sizes at 
> the NameNode.
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-620
>                 URL: http://issues.apache.org/jira/browse/HADOOP-620
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Raghu Angadi
>         Assigned To: Raghu Angadi
>            Priority: Minor
>
> Currently 'dfs -report' calculates replication facto like the following :
>      (totalCapacity - totalDiskRemaining) / (totalSize of dfs files in Name 
> space).
> Problem with this is that this includes disk space used by non-dfs files 
> (e.g. map reduce jobs) on data node. On my single node test, I get 
> replication factor of 100 since I have a 1 GB dfs file with out replication 
> and there is 99GB of unrelated data on the same volume.
> ideally name should calculate it with : (total size of all the blocks known 
> to it) / (total size of files in Name space).
> Initial proposal to keep 'total size of all the blocks' update is to track it 
> in datanode descriptor and update it when namenode receives block reports 
> from the datanode ( and subtract when the datanode is removed).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to