On Fri, May 13, 2011 at 10:40 AM, Kester, Scott <skes...@weather.com> wrote:
> We have an 11 node Hadoop cluster running 20.2 that has been in > production for 15 months now. The system is used to process log files that > are ingested daily, and the oldest files in the HDFS are deleted to free up > space as needed, typically when the free space is less than 10% (the delete > is done using 'hadoop fs -rmr' on the parent directory of the files to be > deleted). When the HDFS was originally built it had 1TB of 'Non DFS' space > out of the 20TB total. This 1TB stayed constant for at least the first year > the system has been in use. > > However over the last few weeks I have seen the 'Non DFS Used' as > reported by the NameNode dfshealth.jsp page grow to 2G and rising. The > total number of files/directories and blocks in use has remained fairly > constant over this time. I am concerned that the Non DFS Used is going to > consume more and more of the HDFS if left unchecked. Running fcsk gave "The > filesystem under path '/' is HEALTHY". > > Questions: > > A) What exactly is hadoop reporting as 'Non DFS Used', and how is it > calculated? Are these files on the same partition(s) as the HDFS files, but > are not actually part of the HDFS? > > Yes - it's usage reported by "df" that isn't coming from HDFS blocks. > 2) Any ideas on what is driving the growth in Non DFS Used space? I > looked for things like growing log files on the datanodes but didn't find > anything. > Logs are one possible culprit. Another is to look for old files that might be orphaned in your mapred.local.dir - there have been bugs in the past where we've leaked files. If you shut down the TaskTrackers, you can safely delete everything from within mapred.local.dirs. -Todd -- Todd Lipcon Software Engineer, Cloudera