[ 
https://issues.apache.org/jira/browse/HADOOP-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12635317#action_12635317
 ] 

Amar Kamat commented on HADOOP-4261:
------------------------------------

Few comments w.r.t job-recovery
1) Upon restart, the task-completion-events/task-reports for the setup tasks 
should also match.
2) It would make more sense to call the job run-state as {{SETUP}} when 
{{logInited()}} is invoked. While recovering, check if the SETUP state is 
reached before calling {{init()}}.
3) Check if {{JobInProgress.obtainSetupTask()}} can reuse 
{{JobInProgress.addRunningTaskToTIP()}}.
4) I think {{JobInProgress.canLaunchSetupTask()}} can also be written as
{code}
private synchronized boolean canLaunchSetupTask() {
    // check if the job is in PREP, initialized and not setup
    return status.getRunState() == JobStatus.PREP && tasksInited.get() && 
!launchedSetup;
}
{code}
5) I dont see any code that deals with setup task in job-recovery i.e 
recovery-manager. Just make sure that the effect of scheduling setup tasks 
before restart is same as the effect of replaying it from history. I assume 
that when the JIP is given a task-attempt update, it figures out if the task if 
setup or not. Ideally the way setup is launched from a recvory-manager should 
mimic the way its invoked from the real(live) jobtracker.

> Jobs failing in the init stage will never cleanup
> -------------------------------------------------
>
>                 Key: HADOOP-4261
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4261
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Amar Kamat
>            Assignee: Amareshwari Sriramadasu
>            Priority: Blocker
>             Fix For: 0.19.0
>
>         Attachments: patch-4261.txt
>
>
> Pre HADOOP-3150, if the job fails in the init stage, {{job.kill()}} was 
> called. This used to make sure that the job was cleaned up w.r.t 
> - staus set to KILLED/FAILED
> - job files from the system dir are deleted
> - closing of job history files
> - making jobtracker aware of this through {{jobTracker.finalizeJob()}}
> - cleaning up the data structures via {{JobInProgress.garbageCollect()}}
> Now if the job fails in the init stage, {{job.fail()}} is called which doesnt 
> do the cleanup. HADOOP-3150 introduces cleanup tasks which are launched once 
> the job completes i.e killed/failed/succeeded.  Jobtracker will never 
> consider this job for scheduling as the job will be in the {{PREP}} state 
> forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to