[ 
https://issues.apache.org/jira/browse/HADOOP-5068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod K V updated HADOOP-5068:
------------------------------

    Attachment: HADOOP-5068-20090120-git.txt

I could reproduce the problem, it surfaces very often in consecutive runs. The 
reason for the (random) failure:

In FakeTaskTrackerManager.killJob(),
{code}
    public void killJob(JobID jobid) throws IOException {
      JobInProgress job = jobs.get(jobid);
      finalizeJob(job, JobStatus.KILLED);
      job.kill();
    }
{code}
If the job state becomes RUNNING back again after the finalizeJob call, 
job.kill() will throw the above posted exception. This is possible when 
JobInitializationPoller calls FakeJobInProgress.initTasks() after finalizeJob 
method call finishes but before job.kill() starts.

This failure mostly resulted after the fix for asynchronizing initTasks via 
JobInitializationPoller went in.

Attaching patch. Removed job.kill() from killJob() as it is truly not needed. 
Also, used ControlledJobInitialization so that initialization poller doesn't 
come our way. I ran the test many times now, and do not see any failures any 
more.

> testClusterBlockingForLackOfMemory in TestCapacityScheduler fails randomly
> --------------------------------------------------------------------------
>
>                 Key: HADOOP-5068
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5068
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>            Reporter: Sreekanth Ramakrishnan
>            Assignee: Vinod K V
>         Attachments: HADOOP-5068-20090120-git.txt
>
>
> testClusterBlockingForLackOfMemory fails randomly when TestCapacityScheduler 
> is run.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to