Again, where are you seeing the attemptid directories? are they at mapred/local/<attemptid> or at mapred/local/taskTracker/jobCache/<jobid>/<attempid>. If you are seeing files at mapred/local/<attemptid>, then it is bug. Please raise a jira and attach tasktracker logs if possible. If not, mapred/local/taskTracker/jobCache/<jobid>/<attempid> directories are cleaned up on a KillTaskAction and mapred/local/taskTracker/jobCache/<jobid> directories are cleanedup on KillJobAction. Can you verify from TaskTracker logs, the attemptid got a KillTaskAction or jobid got a KillJobAction? If not, This is fixed by HADOOP-5247.

Thanks
Amareshwari

Sandhya E wrote:
Hi Amareshwari

We are on 0.18 version. I verified from jobtracker website that not
all killed tasks have left overs in mapred/local.  Also there are some
tasks that were successful have left their tmp folders in mapred/local

Can you please give some pointers on how to debug it further.

Regards
Sandhya

On Tue, Apr 28, 2009 at 2:02 PM, Amareshwari Sriramadasu
<amar...@yahoo-inc.com> wrote:
Hi Sandhya,

 Which version of HADOOP are you using? There could be <attempt_id>
directories in mapred/local, pre 0.17. Now, there should not be any such
directories.
From version 0.17 onwards, the attempt directories will be present only at
mapred/local/taskTracker/jobCache/<jobid>/<attempid> . If you are seeing the
directories in any other location, then it seems like a bug.

HADOOP-4654 is to cleanup temporary data in DFS for failed tasks, it does
not change local FileSystem files.

Thanks
Amareshwari
Edward J. Yoon wrote:
Hi,

It seems related with https://issues.apache.org/jira/browse/HADOOP-4654.

On Tue, Apr 28, 2009 at 4:01 PM, Sandhya E <sandhyabhas...@gmail.com>
wrote:

Hi

Under <hadoop-tmp-dir>/mapred/local there are directories like
"attempt_200904262046_0026_m_000002_0"
Each of these directories contains files of format: intermediate.1
intermediate.2  intermediate.3  intermediate.4  intermediate.5
There are many directories in this format. All these correspond to
killed task attempts. As they contain huge intermediate files, we
landed up in disk space issues.

They are cleaned up  when mapred cluster is restarted. But otherwise,
how can these be cleaned up without having to restart cluster.

Conf parameter "keep.failed.task.files" is set to "false" in our case.

Many Thanks
Sandhya





Reply via email to