Re: Hadoop dfs usage and actual size discrepancy

Brian Bockelman Wed, 09 Dec 2009 13:48:31 -0800

Hey Nick,

What's the output of this:


hadoop dfsadmin -report | grep "Non DFS Used" | grep -v "0 KB" | awk '{sum += 
$4} END {print sum}'

What version of Hadoop is this?

Brian

On Dec 9, 2009, at 10:25 PM, Nick Bailey wrote:

> Output from bottom of fsck report:
> 
> Total size:    8711239576255 B (Total open files size: 3571494 B)
> Total dirs:    391731
> Total files:   2612976 (Files currently being written: 3)
> Total blocks (validated):      2274747 (avg. block size 3829542 B) (Total 
> open file blocks (not validated): 1)
> Minimally replicated blocks:   2274747 (100.0 %)
> Over-replicated blocks:        75491 (3.3186548 %)
> Under-replicated blocks:       36945 (1.6241367 %)
> Mis-replicated blocks:         0 (0.0 %)
> Default replication factor:    3
> Average block replication:     3.017153
> Corrupt blocks:                0
> Missing replicas:              36945 (0.53830105 %)
> Number of data-nodes:          25
> Number of racks:               1
> 
> 
> 
> Output from top of dfsadmin -report:
> 
> Total raw bytes: 110689488793600 (100.67 TB)
> Remaining raw bytes: 46994184353977 (42.74 TB)
> Used raw bytes: 55511654282643 (50.49 TB)
> % used: 50.15%
> 
> Total effective bytes: 0 (0 KB)
> Effective replication multiplier: Infinity
> 
> 
> Not sure what the last two lines fo the dfsadmin report mean, but we have a 
> neglible amount of over replicated blocks according to fsck.  The rest of the 
> dfsadmin report confirms what the web interface says in that the nodes have 
> way more data than 8.6TB * 3.
> 
> Thoughts?
> 
> 
> 
> -----Original Message-----
> From: "Brian Bockelman" <bbock...@cse.unl.edu>
> Sent: Wednesday, December 9, 2009 3:35pm
> To: common-user@hadoop.apache.org
> Cc: core-u...@hadoop.apache.org
> Subject: Re: Hadoop dfs usage and actual size discrepancy
> 
> Hey Nick,
> 
> Try:
> 
> hadoop fsck /
> hadoop dfsadmin -report
> 
> Should give you information about, for example, the non-HDFS data and the 
> average replication factor.
> 
> Or is this how you determined you had a replication factor of 3?
> 
> Brian
> 
> On Dec 9, 2009, at 9:33 PM, Nick Bailey wrote:
> 
>> We have a hadoop cluster with a 100TB capacity, and according to the dfs web 
>> interface we are using 50% of our capacity (50TB).  However doing 'hadoop fs 
>> -dus /' says the total size of everything is  about 8.6TB.  Everything has a 
>> replication factor of 3 so we should only be using around 26TB of our 
>> cluster.
>> 
>> I've verified the replication factors and I've also checked the datanode 
>> machines to see if something non hadoop related is accidentally being stored 
>> on the drives hadoop is using for storage, but nothing is.
>> 
>> Has anyone had a similar problem and have any debugging suggestions?
>> 
>> Thanks,
>> Nick Bailey
>> 
> 
>

smime.p7s
Description: S/MIME cryptographic signature

Re: Hadoop dfs usage and actual size discrepancy

Reply via email to