Re: Hadoop dfs usage and actual size discrepancy

Nick Bailey Wed, 09 Dec 2009 13:25:39 -0800

Output from bottom of fsck report:

 Total size:    8711239576255 B (Total open files size: 3571494 B)
 Total dirs:    391731
 Total files:   2612976 (Files currently being written: 3)
 Total blocks (validated):      2274747 (avg. block size 3829542 B) (Total open 
file blocks (not validated): 1)
 Minimally replicated blocks:   2274747 (100.0 %)
 Over-replicated blocks:        75491 (3.3186548 %)
 Under-replicated blocks:       36945 (1.6241367 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    3
 Average block replication:     3.017153
 Corrupt blocks:                0
 Missing replicas:              36945 (0.53830105 %)
 Number of data-nodes:          25
 Number of racks:               1

Output from top of dfsadmin -report:

Total raw bytes: 110689488793600 (100.67 TB)
Remaining raw bytes: 46994184353977 (42.74 TB)
Used raw bytes: 55511654282643 (50.49 TB)
% used: 50.15%

Total effective bytes: 0 (0 KB)
Effective replication multiplier: Infinity

Not sure what the last two lines fo the dfsadmin report mean, but we have a 
neglible amount of over replicated blocks according to fsck.  The rest of the 
dfsadmin report confirms what the web interface says in that the nodes have way 
more data than 8.6TB * 3.

Thoughts?

-----Original Message-----
From: "Brian Bockelman" <bbock...@cse.unl.edu>
Sent: Wednesday, December 9, 2009 3:35pm
To: common-user@hadoop.apache.org
Cc: core-u...@hadoop.apache.org
Subject: Re: Hadoop dfs usage and actual size discrepancy

Hey Nick,

Try:

hadoop fsck /
hadoop dfsadmin -report

Should give you information about, for example, the non-HDFS data and the 
average replication factor.

Or is this how you determined you had a replication factor of 3?

Brian

On Dec 9, 2009, at 9:33 PM, Nick Bailey wrote:

> We have a hadoop cluster with a 100TB capacity, and according to the dfs web 
> interface we are using 50% of our capacity (50TB).  However doing 'hadoop fs 
> -dus /' says the total size of everything is  about 8.6TB.  Everything has a 
> replication factor of 3 so we should only be using around 26TB of our cluster.
> 
> I've verified the replication factors and I've also checked the datanode 
> machines to see if something non hadoop related is accidentally being stored 
> on the drives hadoop is using for storage, but nothing is.
> 
> Has anyone had a similar problem and have any debugging suggestions?
> 
> Thanks,
> Nick Bailey
>

Re: Hadoop dfs usage and actual size discrepancy

Reply via email to