Hey Nick, What's the output of this:
hadoop dfsadmin -report | grep "Non DFS Used" | grep -v "0 KB" | awk '{sum += $4} END {print sum}' What version of Hadoop is this? Brian On Dec 9, 2009, at 10:25 PM, Nick Bailey wrote: > Output from bottom of fsck report: > > Total size: 8711239576255 B (Total open files size: 3571494 B) > Total dirs: 391731 > Total files: 2612976 (Files currently being written: 3) > Total blocks (validated): 2274747 (avg. block size 3829542 B) (Total > open file blocks (not validated): 1) > Minimally replicated blocks: 2274747 (100.0 %) > Over-replicated blocks: 75491 (3.3186548 %) > Under-replicated blocks: 36945 (1.6241367 %) > Mis-replicated blocks: 0 (0.0 %) > Default replication factor: 3 > Average block replication: 3.017153 > Corrupt blocks: 0 > Missing replicas: 36945 (0.53830105 %) > Number of data-nodes: 25 > Number of racks: 1 > > > > Output from top of dfsadmin -report: > > Total raw bytes: 110689488793600 (100.67 TB) > Remaining raw bytes: 46994184353977 (42.74 TB) > Used raw bytes: 55511654282643 (50.49 TB) > % used: 50.15% > > Total effective bytes: 0 (0 KB) > Effective replication multiplier: Infinity > > > Not sure what the last two lines fo the dfsadmin report mean, but we have a > neglible amount of over replicated blocks according to fsck. The rest of the > dfsadmin report confirms what the web interface says in that the nodes have > way more data than 8.6TB * 3. > > Thoughts? > > > > -----Original Message----- > From: "Brian Bockelman" <bbock...@cse.unl.edu> > Sent: Wednesday, December 9, 2009 3:35pm > To: common-user@hadoop.apache.org > Cc: core-u...@hadoop.apache.org > Subject: Re: Hadoop dfs usage and actual size discrepancy > > Hey Nick, > > Try: > > hadoop fsck / > hadoop dfsadmin -report > > Should give you information about, for example, the non-HDFS data and the > average replication factor. > > Or is this how you determined you had a replication factor of 3? > > Brian > > On Dec 9, 2009, at 9:33 PM, Nick Bailey wrote: > >> We have a hadoop cluster with a 100TB capacity, and according to the dfs web >> interface we are using 50% of our capacity (50TB). However doing 'hadoop fs >> -dus /' says the total size of everything is about 8.6TB. Everything has a >> replication factor of 3 so we should only be using around 26TB of our >> cluster. >> >> I've verified the replication factors and I've also checked the datanode >> machines to see if something non hadoop related is accidentally being stored >> on the drives hadoop is using for storage, but nothing is. >> >> Has anyone had a similar problem and have any debugging suggestions? >> >> Thanks, >> Nick Bailey >> > >
smime.p7s
Description: S/MIME cryptographic signature