[jira] Updated: (HADOOP-3412) Refactor the scheduler out of the JobTracker

Tom White (JIRA) Thu, 17 Jul 2008 07:38:26 -0700

     [ 
https://issues.apache.org/jira/browse/HADOOP-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Tom White updated HADOOP-3412:
------------------------------

    Attachment: JobScheduler-v12.patch

Vivek, You're quite right. I think the hierarchy is a symptom of the abstract 
base class, TaskScheduler. It's mixing two concerns: listening to job state 
changes, and scheduling.

To fix this I propose that we break out the listener methods from the 
TaskScheduler into a JobInProgressListener interface:

{code}

interface JobInProgressListener {
  void jobAdded(JobInProgress job);
  void jobRemoved(JobInProgress job);
  void jobUpdated(JobInProgress job);
}

abstract class TaskScheduler {
  public void start() throws IOException {}
  public void terminate() throws IOException {}
  public abstract List<Task> assignTasks(TaskTrackerStatus taskTracker) throws 
IOException;
}

{code}

TaskSchedulers can then use a choice of JobInProgressListener implementations. 
For example, JobQueueTaskScheduler has a JobQueueJobInProgressListener to 
maintain its job queue and a EagerTaskInitializationListener to do task 
initialization.

TaskSchedulers register their listeners with the JobTracker so we add the 
following two methods to JobTracker (and the TaskTrackerManager interface):

{code}
public void addJobInProgressListener(JobInProgressListener listener);
public void removeJobInProgressListener(JobInProgressListener listener);
{code}

In the future we might add a TaskInProgressListener interface that allowed 
TaskSchedulers to listen for changes to tasks.

bq. I want to build a scheduler that limits concurrent tasks per job, but does 
not want to initialize jobs in a separate thread

To do this with the proposed changes you would override 
LimitTasksPerJobTaskScheduler#start so it doesn't start the 
EagerTaskInitializationListener, but instead creates a listener to initialize 
tasks according to its own policy.

I've attached a new patch which implements the above (v12).

> Refactor the scheduler out of the JobTracker
> --------------------------------------------
>
>                 Key: HADOOP-3412
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3412
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Brice Arnould
>            Assignee: Brice Arnould
>            Priority: Minor
>             Fix For: 0.19.0
>
>         Attachments: JobScheduler-v10.patch, JobScheduler-v11.patch, 
> JobScheduler-v12.patch, JobScheduler-v9.1.patch, JobScheduler-v9.2.patch, 
> JobScheduler-v9.patch, JobScheduler.patch, JobScheduler_v2.patch, 
> JobScheduler_v3.patch, JobScheduler_v3b.patch, JobScheduler_v4.patch, 
> JobScheduler_v5.patch, JobScheduler_v6.1.patch, JobScheduler_v6.2.patch, 
> JobScheduler_v6.3.patch, JobScheduler_v6.4.patch, JobScheduler_v6.patch, 
> JobScheduler_v7.1.patch, JobScheduler_v7.patch, JobScheduler_v8.patch, 
> RackAwareJobScheduler.java, SimpleResourceAwareJobScheduler.java
>
>
> First I would like warn you that my proposition is assumed to be very naive. 
> I just hope that reading it won't make you lose time.
> h4. The aim
> It seems to me that improving Hadoop scheduling could be very profitable. 
> But, it is hard to implement and compare schedulers, because the scheduling 
> logic is mixed within the rest of the JobTracker.
> This bug is the first step of an attempt to improve the Hadoop scheduler. It 
> re-implements the current scheduling algorithm in a separate class called 
> JobScheduler. This new class is instantiated in the JobTracker.
> h4. Bug fixed as a side effects
> This patch probably cannot be submited as it is.
> A first difficulty is that it does not have exactly the same behaviour than 
> the current JobTracker. More precisely, it doesn't re-implement things like 
> code that seems to be never called or concurency problems.
> I wrote TOCONFIRM where my proposition differ from the current 
> implementation, so you can find them easily.
> I know that fixing bugs silently is bad. So, independently of what you decide 
> about this patch, I will open issues for bugs that you confirm.
> h4. Other side effects
> Another side effect of this patch is to add documentation about each step of 
> the scheduling. I hope that it will help future improvement by lowering the 
> level required to contribute to the scheduler.
> It also reduces the complexity and the granularity of the JobTracker (making 
> it more parallel).
> h4. The future
> If you feel that this is a step the right direction, I will try to propose a 
> JobSchedulerInterface that many JobSchedulers could implement and to propose 
> alternatives to the current « FifoJobScheduler ».  If some of you have ideas 
> about that please tell ^^ I will also open issues for things marked as FIXME 
> in the patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3412) Refactor the scheduler out of the JobTracker

Reply via email to