[
https://issues.apache.org/jira/browse/HADOOP-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655183#action_12655183
]
Amareshwari Sriramadasu commented on HADOOP-4759:
-------------------------------------------------
After discussion with Devaraj and Owen, I summarize the approach here:
* Child.java can have cleanup code in finally block. This will make sure that
the cleanup will happen if the failure is because of Exception/Error, this will
cover a majority of cases.
* Any other type of fail or kill of the attempt makes it FAILED_UNCLEAN or
KILLED_UNCLEAN. JobTracker will launch a separate cleanup task for
FAILED_UNCLEAN and KILLED_UNCLEAN attempts. The cleanup task will take the
attempt to FAILED or KILLED
* JT stops launching cleanup tasks for attempts once job succeeds/fails. As
Devaraj told, this also means that the job level cleanup task
(OutputCommitted.cleanupJob) has run, with the assumption that the job level
cleanup has cleaned all garbage up.
Two approches here:
1. We can use the same attempt for launching the cleanup. Here, the same
attempt will launched with starting state as \*_UNCLEAN, instead of UNASSIGNED.
When the cleanup is successful, it will go to FAILED or KILLED. If it fails,
it will be left in *_UNCLEAN state.
We would need additional logic for scheduler to handle retries, if needed.
2. Have a separate tip for doing the cleanup. Associate the cleanup tip with
failed/killed attempt, by passing the attempt_id through configuration.
Once the tip succeeds ( after four retry attempts, by default), it will move
the corresponding attempt to FAILED or KILLED. If the tip fails, it will leave
the attempt in \*_UNCLEAN state.
Thoughts?
> HADOOP-4654 to be fixed for branches >= 0.19
> --------------------------------------------
>
> Key: HADOOP-4759
> URL: https://issues.apache.org/jira/browse/HADOOP-4759
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.19.0
> Reporter: Amareshwari Sriramadasu
> Assignee: Amareshwari Sriramadasu
> Fix For: 0.19.1, 0.20.0
>
>
> Since HADOOP-4654 is fixed only for branch 18.3. This jira looks at the issue
> reported for 0.19 and above branches
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.