Re: intermediate files of killed tasks not purged

Amareshwari Sriramadasu Tue, 28 Apr 2009 02:06:10 -0700

Again, where are you seeing the attemptid directories? are they atmapred/local/<attemptid> or atmapred/local/taskTracker/jobCache/<jobid>/<attempid>.If you are seeing files at mapred/local/<attemptid>, then it is bug.Please raise a jira and attach tasktracker logs if possible.If not, mapred/local/taskTracker/jobCache/<jobid>/<attempid> directoriesare cleaned up on a KillTaskAction andmapred/local/taskTracker/jobCache/<jobid> directories are cleanedup onKillJobAction. Can you verify from TaskTracker logs, the attemptid got aKillTaskAction or jobid got a KillJobAction? If not, This is fixed byHADOOP-5247.


Thanks
Amareshwari


Sandhya E wrote:

Hi Amareshwari

We are on 0.18 version. I verified from jobtracker website that not
all killed tasks have left overs in mapred/local.  Also there are some
tasks that were successful have left their tmp folders in mapred/local

Can you please give some pointers on how to debug it further.

Regards
Sandhya

On Tue, Apr 28, 2009 at 2:02 PM, Amareshwari Sriramadasu
<amar...@yahoo-inc.com> wrote:

Hi Sandhya,

 Which version of HADOOP are you using? There could be <attempt_id>
directories in mapred/local, pre 0.17. Now, there should not be any such
directories.
From version 0.17 onwards, the attempt directories will be present only at
mapred/local/taskTracker/jobCache/<jobid>/<attempid> . If you are seeing the
directories in any other location, then it seems like a bug.

HADOOP-4654 is to cleanup temporary data in DFS for failed tasks, it does
not change local FileSystem files.

Thanks
Amareshwari
Edward J. Yoon wrote:

Hi,

It seems related with https://issues.apache.org/jira/browse/HADOOP-4654.

On Tue, Apr 28, 2009 at 4:01 PM, Sandhya E <sandhyabhas...@gmail.com>
wrote:

Hi

Under <hadoop-tmp-dir>/mapred/local there are directories like
"attempt_200904262046_0026_m_000002_0"
Each of these directories contains files of format: intermediate.1
intermediate.2  intermediate.3  intermediate.4  intermediate.5
There are many directories in this format. All these correspond to
killed task attempts. As they contain huge intermediate files, we
landed up in disk space issues.

They are cleaned up  when mapred cluster is restarted. But otherwise,
how can these be cleaned up without having to restart cluster.

Conf parameter "keep.failed.task.files" is set to "false" in our case.

Many Thanks
Sandhya

Re: intermediate files of killed tasks not purged

Reply via email to