[
https://issues.apache.org/jira/browse/HADOOP-5327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12679548#action_12679548
]
Hemanth Yamijala commented on HADOOP-5327:
------------------------------------------
I think this patch should also add the system directory to the clean up thread
in the code path where job submission fails due to ACLs. In a majority of the
cases, this action alone will prevent the problem from happening in the first
place. However, this is only in addition to the changes in the patch as they
are still needed to take care of cases where the job tracker could be restarted
before the clean up thread has had a chance to delete the system directory
completely.
Regarding cleanup, there seem to be two different cases here:
- The job was never submitted in the first place
- The job was running in the first place, and after restart it can no longer
run because the ACLs were changed.
I think the patch is cleanly handling the first case (with the comments
incorporated). In the second case, ideally the job should be killed by the
JobTracker so that all parts related to the job (system directory, running
tasks, cleanup task, etc) are cleaned up properly. I am thinking handling the
second case (which ideally should be rare) should be a separate jira. Thoughts ?
> Job files for a job failing because of ACLs are not clean from the system
> directory
> ------------------------------------------------------------------------------------
>
> Key: HADOOP-5327
> URL: https://issues.apache.org/jira/browse/HADOOP-5327
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.20.0
> Reporter: Karam Singh
> Assignee: Amar Kamat
> Priority: Blocker
> Fix For: 0.20.0
>
> Attachments: HADOOP-5327-v2.3.patch
>
>
> Jobs which failed because of ACLs gets added during JT restart recovery
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.