[
https://issues.apache.org/jira/browse/HADOOP-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sreekanth Ramakrishnan updated HADOOP-4981:
-------------------------------------------
Attachment: HADOOP-4981-4.patch
Attaching new patch fixing test case issues:
Currently following is steps used to test out high ram jobs with speculative
execution.
- Configure two task trackers each having 3 GB vmem and 1 GB physical memory
and 2 maps and 2 reduce slots each.
- Submit one high memory job which has 2 GB vmem task requirement which has a
map task and a speculative map task and zero reduce.
- Submit another job which has low memory requirement of 100 MB vmem which has
a map task and no reduce task.
- First map from first job would be scheduled on tt1.
- Check that cluster is blocked until, speculative task of high memory job is
scheduled.
- Once high memory job's speculative map is scheduled, the map task of normal
job should be scheduled.
- Now submit high memory job which has 2 GB vme requirement and has 1 map and 1
reduce with a speculative reduce.
- Now submit a normal job which has memory requirement of 100 MB.
- First high memory job's map is scheduled on tt1.
- When tt1 gets back to scheduler asking for a task, it is blocked because the
scheduler would try to assign a task from job3 which is a high memory job and
it has already taken the map slot in the tracker, so there would not be any
space.
- When tt2 gets back to the scheduler asking for a task, it would be assigned
map from job4.
- when tt2 gets back again to scheduler, then reduce of job3 would start
running.
- Now cluster is blocked till speculative reduce of job3 runs.
- Finish map tasks.
- Make tt1 come to scheduler, it would be assigned speculative reduce task of
job3.
- Make tt1 come back to scheduler,. it would be assigned normal reduce task of
job4 as it would fit the memory.
The {{FakeJobInProgress}} resets the {{hasSpeculativeMap}} and
{{hasSpeculativeReduce}} fields in {{FakeJobInProgress}} after a speculative
task is given to the scheduler.
> Prior code fix in Capacity Scheduler prevents speculative execution in jobs
> ---------------------------------------------------------------------------
>
> Key: HADOOP-4981
> URL: https://issues.apache.org/jira/browse/HADOOP-4981
> Project: Hadoop Core
> Issue Type: Bug
> Components: contrib/capacity-sched
> Reporter: Vivek Ratan
> Attachments: 4981.1.patch, 4981.2.patch, HADOOP-4981-1.patch,
> HADOOP-4981-2.patch, HADOOP-4981-3.patch, HADOOP-4981-4.patch
>
>
> As part of the code fix for HADOOP-4035, the Capacity Scheduler obtains a
> task from JobInProgress (calling obtainNewMapTask() or obtainNewReduceTask())
> only if the number of pending tasks for a job is greater than zero (see the
> if-block in TaskSchedulingMgr.getTaskFromJob()). So, if a job has no pending
> tasks and only has running tasks, it will never be given a slot, and will
> never have a chance to run a speculative task.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.