[ 
https://issues.apache.org/jira/browse/HADOOP-4654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12650477#action_12650477
 ] 

Amareshwari Sriramadasu commented on HADOOP-4654:
-------------------------------------------------

The problem is because of cleaning up the temporary output directory of failed 
tasks done at the end of job.

Till 0.18.X, the task commit is done by TaskCommitThread in JT. We can have 
functionality added here if needed for 0.18. 

After HADOOP-3150, it is exposed to the user. Now, each task does commit at the 
end of it's execution. To remove temporary output directory of failed/killed 
tasks as soon as they fail, we should consider the following:
1. Failure/Kill can be anywhere between 'launching the task' to 'commiting the 
task'
2. Failure/Kill can be because of KillTaskAction or Exception/Error 

Owen's suggestion on HADOOP-3150 at 
http://issues.apache.org/jira/browse/HADOOP-3150?focusedCommentId=12626372#action_12626372
and 
http://issues.apache.org/jira/browse/HADOOP-3150?focusedCommentId=12628736#action_12628736
 _to have task commit as separate task_ looks like the right approach here.
For successful tasks, a commit task will be launched for tha task commit. For 
failed/killed tasks, an abort task will be launched for the task cleanup.
Thoughts?

> remove temporary output directory of failed tasks
> -------------------------------------------------
>
>                 Key: HADOOP-4654
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4654
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.17.2, 0.18.1
>            Reporter: Christian Kunz
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.20.0
>
>
> When dfs is getting full (80+% of reserved space), the rate of write failures 
> increases, such that more map-reduce tasks can fail. By not cleaning up the 
> temporary output directory of tasks the situation worsens over the lifetime 
> of a job, increasing the probability of the whole job failing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to