We have a hadoop cluster with a 100TB capacity, and according to the dfs web interface we are using 50% of our capacity (50TB). However doing 'hadoop fs -dus /' says the total size of everything is about 8.6TB. Everything has a replication factor of 3 so we should only be using around 26TB of our cluster.
I've verified the replication factors and I've also checked the datanode machines to see if something non hadoop related is accidentally being stored on the drives hadoop is using for storage, but nothing is. Has anyone had a similar problem and have any debugging suggestions? Thanks, Nick Bailey