[
https://issues.apache.org/jira/browse/HADOOP-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12612828#action_12612828
]
Vivek Ratan commented on HADOOP-3412:
-------------------------------------
I wanted to add some more detail to my previous comment, and also address
Brice's last comment on implementing queues.
I earlier talked of a _JobQueueManager_, which is responsible for maintaining
the collection of jobs submitted to a JT. This class would deal with how jobs
are stored in memory or on disk. It really would encapsulate the
_jobsByPriority_ and _jobs_ (perhaps) data structures, which are currently in
_JobTracker_. When a job is submitted, i.e., when _JobTracker.submitJob()_ is
called, the _JobInProgress_ object representing the new job would be given to
_JobQueueManager_ to maintain. _addJob()_ and _removeJob()_, or their
equivalent, would be part of _JobQueueManager_. Whatever class derives from
_TaskScheduler_ would invoke methods in _JobQueueManager_ to figure out what
jobs to look at.
It's a bit tricky as to what _JobQueueManager_ should look like. HADOOP-3421
brings in the concept of queues, and jobs being submitted to queues. So you
will have multiple queues, each queue containing jobs, and perhaps supporting
priorities and limits and such. Should _JobQueueManager_ have an explicit
notion of queue names, so that, for example, you could get sorted jobs from a
single queue? Or maybe you get sorted jobs from a collection of queues. For
example, you could have a method in _JobQueueManager_ as follows:
{noformat}
Collection<JobInProgress> getJobs(String queueName, Comparator<JobInProgress>
jobComparator)
{noformat}
I don't want to get into too many details here. These probably belong more on
3445. My point is, when designing _JobQueueManager_, or when modelling the
right interface for _JobQueue_, as Brice was talking about, you need to keep in
mind that there could be multiple queues of jobs, and sometimes you may want to
get jobs from a single queue, and sometimes from multiple queues. Again, why
does this matter?
* I like that Tom did away with the _JobQueue_ abstract class. Its presence
forced a scheduler to assume there was only one queue of jobs (sure, you
needn't have used _JobQueue_, but then it would be wasted), which would create
problems for 3445.
* The design of _JobQueueManager_ is a bit tricky, depending on whether you
want to assume there will always be multiple queues of jobs, and whether you
want to fetch jobs from multiple or single queues in one call. As a *temporary
alternative*, till we figure out the right design, you could leave the
functionality of maintaining jobs in the _JobTracker_ class. This class could
continue supporting _jobsByPriority_ and _jobs_, and expose them directly to
_TaskScheduler_, so that the latter could look at the jobs it needed to.
Yet another alternative is to leave _TaskScheduler_ as is, wait till we have a
patch for 3445, then look at if/how we design _JobQueueManager_. It may end up
changing _TaskScheduler_ by moving _addJob()_ and _removeJob()_, but at least
you'd have a much better idea of what you want in _JobQueueManager_. I
personally prefer this approach, but keep in mind, it may end up changing
_TaskScheduler_.
> Refactor the scheduler out of the JobTracker
> --------------------------------------------
>
> Key: HADOOP-3412
> URL: https://issues.apache.org/jira/browse/HADOOP-3412
> Project: Hadoop Core
> Issue Type: Improvement
> Components: mapred
> Reporter: Brice Arnould
> Assignee: Brice Arnould
> Priority: Minor
> Fix For: 0.19.0
>
> Attachments: JobScheduler-v9.1.patch, JobScheduler-v9.patch,
> JobScheduler.patch, JobScheduler_v2.patch, JobScheduler_v3.patch,
> JobScheduler_v3b.patch, JobScheduler_v4.patch, JobScheduler_v5.patch,
> JobScheduler_v6.1.patch, JobScheduler_v6.2.patch, JobScheduler_v6.3.patch,
> JobScheduler_v6.4.patch, JobScheduler_v6.patch, JobScheduler_v7.1.patch,
> JobScheduler_v7.patch, JobScheduler_v8.patch, RackAwareJobScheduler.java,
> SimpleResourceAwareJobScheduler.java
>
>
> First I would like warn you that my proposition is assumed to be very naive.
> I just hope that reading it won't make you lose time.
> h4. The aim
> It seems to me that improving Hadoop scheduling could be very profitable.
> But, it is hard to implement and compare schedulers, because the scheduling
> logic is mixed within the rest of the JobTracker.
> This bug is the first step of an attempt to improve the Hadoop scheduler. It
> re-implements the current scheduling algorithm in a separate class called
> JobScheduler. This new class is instantiated in the JobTracker.
> h4. Bug fixed as a side effects
> This patch probably cannot be submited as it is.
> A first difficulty is that it does not have exactly the same behaviour than
> the current JobTracker. More precisely, it doesn't re-implement things like
> code that seems to be never called or concurency problems.
> I wrote TOCONFIRM where my proposition differ from the current
> implementation, so you can find them easily.
> I know that fixing bugs silently is bad. So, independently of what you decide
> about this patch, I will open issues for bugs that you confirm.
> h4. Other side effects
> Another side effect of this patch is to add documentation about each step of
> the scheduling. I hope that it will help future improvement by lowering the
> level required to contribute to the scheduler.
> It also reduces the complexity and the granularity of the JobTracker (making
> it more parallel).
> h4. The future
> If you feel that this is a step the right direction, I will try to propose a
> JobSchedulerInterface that many JobSchedulers could implement and to propose
> alternatives to the current « FifoJobScheduler ». If some of you have ideas
> about that please tell ^^ I will also open issues for things marked as FIXME
> in the patch.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.