[ 
https://issues.apache.org/jira/browse/HADOOP-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655183#action_12655183
 ] 

Amareshwari Sriramadasu commented on HADOOP-4759:
-------------------------------------------------

After discussion with Devaraj and Owen, I summarize the approach here:

* Child.java can have cleanup code in finally block. This will make sure that 
the cleanup will happen if the failure is because of Exception/Error, this will 
cover a majority of cases.
* Any other type of fail or kill of the attempt makes it FAILED_UNCLEAN or 
KILLED_UNCLEAN. JobTracker will launch a separate cleanup task for  
FAILED_UNCLEAN and KILLED_UNCLEAN attempts. The cleanup task will take the 
attempt to FAILED or KILLED
* JT stops launching cleanup tasks for attempts once job succeeds/fails. As 
Devaraj told, this also means that the job level cleanup task 
(OutputCommitted.cleanupJob) has run, with the assumption that the job level 
cleanup has cleaned all garbage up.

Two approches here:
1. We can use the same attempt for launching the cleanup. Here, the same 
attempt will launched with starting state as \*_UNCLEAN, instead of UNASSIGNED. 
When the cleanup is successful, it will go to FAILED or KILLED. If it fails,  
it will be left in *_UNCLEAN state. 
We would need additional logic for scheduler to handle retries, if needed. 

2. Have a separate tip for doing the cleanup. Associate the cleanup tip with 
failed/killed attempt, by passing the attempt_id through configuration.
Once the tip succeeds ( after four retry attempts, by default), it will move 
the corresponding attempt to FAILED or KILLED. If the tip fails, it will leave 
the attempt in \*_UNCLEAN state. 

Thoughts?



> HADOOP-4654 to be fixed for branches >= 0.19
> --------------------------------------------
>
>                 Key: HADOOP-4759
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4759
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.19.1, 0.20.0
>
>
> Since HADOOP-4654 is fixed only for branch 18.3. This jira looks at the issue 
> reported for 0.19 and above branches 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to