[ https://issues.apache.org/jira/browse/HADOOP-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12635317#action_12635317 ]
Amar Kamat commented on HADOOP-4261: ------------------------------------ Few comments w.r.t job-recovery 1) Upon restart, the task-completion-events/task-reports for the setup tasks should also match. 2) It would make more sense to call the job run-state as {{SETUP}} when {{logInited()}} is invoked. While recovering, check if the SETUP state is reached before calling {{init()}}. 3) Check if {{JobInProgress.obtainSetupTask()}} can reuse {{JobInProgress.addRunningTaskToTIP()}}. 4) I think {{JobInProgress.canLaunchSetupTask()}} can also be written as {code} private synchronized boolean canLaunchSetupTask() { // check if the job is in PREP, initialized and not setup return status.getRunState() == JobStatus.PREP && tasksInited.get() && !launchedSetup; } {code} 5) I dont see any code that deals with setup task in job-recovery i.e recovery-manager. Just make sure that the effect of scheduling setup tasks before restart is same as the effect of replaying it from history. I assume that when the JIP is given a task-attempt update, it figures out if the task if setup or not. Ideally the way setup is launched from a recvory-manager should mimic the way its invoked from the real(live) jobtracker. > Jobs failing in the init stage will never cleanup > ------------------------------------------------- > > Key: HADOOP-4261 > URL: https://issues.apache.org/jira/browse/HADOOP-4261 > Project: Hadoop Core > Issue Type: Bug > Components: mapred > Reporter: Amar Kamat > Assignee: Amareshwari Sriramadasu > Priority: Blocker > Fix For: 0.19.0 > > Attachments: patch-4261.txt > > > Pre HADOOP-3150, if the job fails in the init stage, {{job.kill()}} was > called. This used to make sure that the job was cleaned up w.r.t > - staus set to KILLED/FAILED > - job files from the system dir are deleted > - closing of job history files > - making jobtracker aware of this through {{jobTracker.finalizeJob()}} > - cleaning up the data structures via {{JobInProgress.garbageCollect()}} > Now if the job fails in the init stage, {{job.fail()}} is called which doesnt > do the cleanup. HADOOP-3150 introduces cleanup tasks which are launched once > the job completes i.e killed/failed/succeeded. Jobtracker will never > consider this job for scheduling as the job will be in the {{PREP}} state > forever. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.