On Wed, Jul 7, 2010 at 9:48 AM, Michael Segel <[email protected]> wrote: > > Non DFS used tends to be logging or some other information on the disk. > > So you can't use hadoop commands to remove the files from the disk. > > > >> Date: Wed, 7 Jul 2010 17:11:38 +0900 >> Subject: How do I remove "Non DFS Used"? >> From: [email protected] >> To: [email protected] >> >> I was looking at the web interface and found that some of my nodes have >> enormous amount of "Non DFS Used". >> >> There is even a node with 800GB of "Non DFS Used" which is just ridiculous. >> >> I tried to remove them by doing: >> >> "hadoop namenode -format" >> >> and I also tried deleting "hadoop.tmp.dir" (in my case, which is >> /home/hadoop/hadoop_storage/tmp/). >> >> But when I start my cluster again, there it is again with thousands of giga >> bytes of "Non DFS Used". >> >> Can anyone tell me what "Non DFS Used" is and how to remove them forever? >> >> Thanks in advance. > > _________________________________________________________________ > The New Busy is not the too busy. Combine all your e-mail accounts with > Hotmail. > http://www.windowslive.com/campaign/thenewbusy?tile=multiaccount&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4
I always suggest running tune2fs -m2 http://old.nabble.com/Optimal-Filesystem-(and-Settings)-for-HDFS-td23600272.html On a 1TB disk you can free up about 30 GB. If you have been running for a while another thing you can do is check your task tracker directories for relics. I find distributed cache jars and task attempts that do not clean up (with my version all the time), then I use find mtime +7 to find files and remove them.
