[ 
https://issues.apache.org/jira/browse/HADOOP-4654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652464#action_12652464
 ] 

Devaraj Das commented on HADOOP-4654:
-------------------------------------

For 0.19 (and the trunk), I am worried about the approach where we spawn a 
commit task for every successful task. The load on the JT in terms of memory 
among other things would be simply doubled. Of course, there are other issues 
like the need to associate the commit task with the real task (since a task's 
success/failure would be governed by the success/failure of its commit task), 
and so on..
So for 0.19 (and the trunk), I propose that we keep the existing model of 
running the commit task (OutputCommitter.commitTask) in the same task context 
and just spawn new tasks for cleaning up the outputs of failed/killed tasks 
(for running OutputCommitter.abortTask). Presumably the number of failed tasks 
won't be many and hence the load on the JT shouldn't increase that much. Also, 
we can probably stop launching cleanup tasks for each failed/killed task once 
the job succeeds/fails (which also means that the job level cleanup task 
(OutputCommitted.cleanupJob) has run), with the assumption that the job level 
cleanup has cleaned all garbage up.
Thoughts?

> remove temporary output directory of failed tasks
> -------------------------------------------------
>
>                 Key: HADOOP-4654
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4654
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.17.2, 0.18.1
>            Reporter: Christian Kunz
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.20.0
>
>         Attachments: patch-4654-0.18.txt
>
>
> When dfs is getting full (80+% of reserved space), the rate of write failures 
> increases, such that more map-reduce tasks can fail. By not cleaning up the 
> temporary output directory of tasks the situation worsens over the lifetime 
> of a job, increasing the probability of the whole job failing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to