Output from bottom of fsck report: Total size: 8711239576255 B (Total open files size: 3571494 B) Total dirs: 391731 Total files: 2612976 (Files currently being written: 3) Total blocks (validated): 2274747 (avg. block size 3829542 B) (Total open file blocks (not validated): 1) Minimally replicated blocks: 2274747 (100.0 %) Over-replicated blocks: 75491 (3.3186548 %) Under-replicated blocks: 36945 (1.6241367 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 3 Average block replication: 3.017153 Corrupt blocks: 0 Missing replicas: 36945 (0.53830105 %) Number of data-nodes: 25 Number of racks: 1
Output from top of dfsadmin -report: Total raw bytes: 110689488793600 (100.67 TB) Remaining raw bytes: 46994184353977 (42.74 TB) Used raw bytes: 55511654282643 (50.49 TB) % used: 50.15% Total effective bytes: 0 (0 KB) Effective replication multiplier: Infinity Not sure what the last two lines fo the dfsadmin report mean, but we have a neglible amount of over replicated blocks according to fsck. The rest of the dfsadmin report confirms what the web interface says in that the nodes have way more data than 8.6TB * 3. Thoughts? -----Original Message----- From: "Brian Bockelman" <bbock...@cse.unl.edu> Sent: Wednesday, December 9, 2009 3:35pm To: common-user@hadoop.apache.org Cc: core-u...@hadoop.apache.org Subject: Re: Hadoop dfs usage and actual size discrepancy Hey Nick, Try: hadoop fsck / hadoop dfsadmin -report Should give you information about, for example, the non-HDFS data and the average replication factor. Or is this how you determined you had a replication factor of 3? Brian On Dec 9, 2009, at 9:33 PM, Nick Bailey wrote: > We have a hadoop cluster with a 100TB capacity, and according to the dfs web > interface we are using 50% of our capacity (50TB). However doing 'hadoop fs > -dus /' says the total size of everything is about 8.6TB. Everything has a > replication factor of 3 so we should only be using around 26TB of our cluster. > > I've verified the replication factors and I've also checked the datanode > machines to see if something non hadoop related is accidentally being stored > on the drives hadoop is using for storage, but nothing is. > > Has anyone had a similar problem and have any debugging suggestions? > > Thanks, > Nick Bailey >