[jira] Commented: (HADOOP-3412) Refactor the scheduler out of the JobTracker

Vivek Ratan (JIRA) Fri, 11 Jul 2008 03:33:04 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12612828#action_12612828
 ]


Vivek Ratan commented on HADOOP-3412:
-------------------------------------

I wanted to add some more detail to my previous comment, and also address 
Brice's last comment on implementing queues. 

I earlier talked of a _JobQueueManager_, which is responsible for maintaining 
the collection of jobs submitted to a JT. This class would deal with how jobs 
are stored in memory or on disk. It really would encapsulate the 
_jobsByPriority_ and _jobs_ (perhaps) data structures, which are currently in 
_JobTracker_. When a job is submitted, i.e., when _JobTracker.submitJob()_ is 
called, the _JobInProgress_ object representing the new job would be given to 
_JobQueueManager_ to maintain. _addJob()_ and _removeJob()_, or their 
equivalent, would be part of _JobQueueManager_. Whatever class derives from 
_TaskScheduler_ would invoke methods in _JobQueueManager_ to figure out what 
jobs to look at. 

It's a bit tricky as to what _JobQueueManager_ should look like. HADOOP-3421 
brings in the concept of queues, and jobs being submitted to queues.  So you 
will have multiple queues, each queue containing jobs, and perhaps supporting 
priorities and limits and such. Should _JobQueueManager_ have an explicit 
notion of queue names, so that, for example, you could get sorted jobs from a 
single queue? Or maybe you get sorted jobs from a collection of queues. For 
example, you could have a method in _JobQueueManager_ as follows: 
{noformat}
Collection<JobInProgress> getJobs(String queueName, Comparator<JobInProgress> 
jobComparator)
{noformat}

I don't want to get into too many details here. These probably belong more on 
3445. My point is, when designing _JobQueueManager_, or when modelling the 
right interface for _JobQueue_, as Brice was talking about, you need to keep in 
mind that there could be multiple queues of jobs, and sometimes you may want to 
get jobs from a single queue, and sometimes from multiple queues. Again, why 
does this matter? 
* I like that Tom did away with the _JobQueue_ abstract class. Its presence 
forced a scheduler to assume there was only one queue of jobs (sure, you 
needn't have used _JobQueue_, but then it would be wasted), which would create 
problems for 3445.  
* The design of _JobQueueManager_ is a bit tricky, depending on whether you 
want to assume there will always be multiple queues of jobs, and whether you 
want to fetch jobs from multiple or single queues in one call. As a *temporary 
alternative*, till we figure out the right design, you could leave the 
functionality of maintaining jobs in the _JobTracker_ class. This class could 
continue supporting _jobsByPriority_ and _jobs_, and expose them directly to 
_TaskScheduler_, so that the latter could look at the jobs it needed to. 

Yet another alternative is to leave _TaskScheduler_ as is, wait till we have a 
patch for 3445, then look at if/how we design _JobQueueManager_. It may end up 
changing _TaskScheduler_ by moving _addJob()_ and _removeJob()_, but at least 
you'd have a much better idea of what you want in _JobQueueManager_. I 
personally prefer this approach, but keep in mind, it may end up changing 
_TaskScheduler_.

> Refactor the scheduler out of the JobTracker
> --------------------------------------------
>
>                 Key: HADOOP-3412
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3412
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Brice Arnould
>            Assignee: Brice Arnould
>            Priority: Minor
>             Fix For: 0.19.0
>
>         Attachments: JobScheduler-v9.1.patch, JobScheduler-v9.patch, 
> JobScheduler.patch, JobScheduler_v2.patch, JobScheduler_v3.patch, 
> JobScheduler_v3b.patch, JobScheduler_v4.patch, JobScheduler_v5.patch, 
> JobScheduler_v6.1.patch, JobScheduler_v6.2.patch, JobScheduler_v6.3.patch, 
> JobScheduler_v6.4.patch, JobScheduler_v6.patch, JobScheduler_v7.1.patch, 
> JobScheduler_v7.patch, JobScheduler_v8.patch, RackAwareJobScheduler.java, 
> SimpleResourceAwareJobScheduler.java
>
>
> First I would like warn you that my proposition is assumed to be very naive. 
> I just hope that reading it won't make you lose time.
> h4. The aim
> It seems to me that improving Hadoop scheduling could be very profitable. 
> But, it is hard to implement and compare schedulers, because the scheduling 
> logic is mixed within the rest of the JobTracker.
> This bug is the first step of an attempt to improve the Hadoop scheduler. It 
> re-implements the current scheduling algorithm in a separate class called 
> JobScheduler. This new class is instantiated in the JobTracker.
> h4. Bug fixed as a side effects
> This patch probably cannot be submited as it is.
> A first difficulty is that it does not have exactly the same behaviour than 
> the current JobTracker. More precisely, it doesn't re-implement things like 
> code that seems to be never called or concurency problems.
> I wrote TOCONFIRM where my proposition differ from the current 
> implementation, so you can find them easily.
> I know that fixing bugs silently is bad. So, independently of what you decide 
> about this patch, I will open issues for bugs that you confirm.
> h4. Other side effects
> Another side effect of this patch is to add documentation about each step of 
> the scheduling. I hope that it will help future improvement by lowering the 
> level required to contribute to the scheduler.
> It also reduces the complexity and the granularity of the JobTracker (making 
> it more parallel).
> h4. The future
> If you feel that this is a step the right direction, I will try to propose a 
> JobSchedulerInterface that many JobSchedulers could implement and to propose 
> alternatives to the current « FifoJobScheduler ».  If some of you have ideas 
> about that please tell ^^ I will also open issues for things marked as FIXME 
> in the patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3412) Refactor the scheduler out of the JobTracker

Reply via email to