[ 
https://issues.apache.org/jira/browse/HADOOP-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12612519#action_12612519
 ] 

Vivek Ratan commented on HADOOP-3412:
-------------------------------------

Nice job, Tom. It's a lot cleaner and will make it easier for us to merge stuff 
from HADOOP-3445. I did have a couple of comments: 

A Scheduler is usually just an algorithm for deciding which task to pick for a 
TT. It uses information from the TT (its hostname, rack, what it's currently 
running, how many resources it has free), and information from what is required 
(which job/queue to look at, capacities, user limits) and makes the best match. 
I'm wondering if _TaskScheduler_ should handle addition/removal of jobs. As we 
make the JT persistent, we need the ability to persist job state to disk, to 
initialize jobs (expand their task-based structures, as in 
JobInProgress::Init()) dynamically (since, in order to scale, you don't want 
jobs to be expanded unless absolutely needed), to store only relevant 
information in memory and the rest on disk. Something else should likely do 
this, not  _TaskScheduler_.  _TaskScheduler_ needs to access the collection of 
jobs when it runs its scheduling algorithms, but it should not be responsible 
for them. Methods like _addJob()_ and _removeJob()_ probably belong to some 
other class, something like a _JobQueueManager_. Which, by the way, can also 
handle multiple queues of jobs, as we'll need for 3445. Maybe the JT itself can 
handle the queues of jobs initially. Regardless, do you think  _TaskScheduler_ 
should be responsible for jobs? 

Another thought I had is regarding the work we're doing for 3445. It's more of 
an observation than a suggestion. HADOOP-3421 introduces multiple queues, 
capacity per queue, and user limits. Each of these features affects the 
scheduling of tasks, which likely would go something like this: a TT indicates 
that it has one or more free Map or Reduce slots. JT figures out whether to 
look for a Map or Reduce task. JT needs to find a task in a job in a queue, so 
it first looks at which queue to consider (primarily based on queue capacities 
and whether the queue can accept a TT slot). Within the selected queue, the JT 
considers which job to look at, based on user limits (the job needs to be 
belong to a user who is not using more capacity than he/she is allowed) and 
priorities. Finally, within a job, the JT (actually, the JobInProgress object) 
needs to pick what task to run, based on data locality, speculation, and some 
other heuristics. My guess is that many developers will want to plug in tweaks 
to some of the pieces of entire scheduling algorithm, and not modify other 
logic. Someone, for example, may want to tweak how a queue is chosen, and not 
touch the other stuff. For this, IMO, we need to break down the JT's scheduling 
flow (decide on M or R task, then pick a queue, then pick a job, then pick a 
task), which now sits in _JobQueueTaskScheduler_, into discrete units and allow 
folks to override one or more of these units. There are a few ways to do this 
and we can use some of the suggestions and principles applied in this patch 
there as well. I guess I'm really making a plug here for folks to look at 
extending the stuff for 3445 (once it is ready) in a similar way, so that the 
entire scheduling flow can be made extensible at different steps. 

Once again, I don't think merging the 3445 work with this patch should be very 
hard, especially with the latest patch. I'll take a look at that soon. Thanks, 
Brice for your offer to help. I'm sure you can help us out there. And of 
course, I'm glad you took this whole effort up and pushed it all the way to 
where it is. Nice work. 


> Refactor the scheduler out of the JobTracker
> --------------------------------------------
>
>                 Key: HADOOP-3412
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3412
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Brice Arnould
>            Assignee: Brice Arnould
>            Priority: Minor
>             Fix For: 0.19.0
>
>         Attachments: JobScheduler-v9.patch, JobScheduler.patch, 
> JobScheduler_v2.patch, JobScheduler_v3.patch, JobScheduler_v3b.patch, 
> JobScheduler_v4.patch, JobScheduler_v5.patch, JobScheduler_v6.1.patch, 
> JobScheduler_v6.2.patch, JobScheduler_v6.3.patch, JobScheduler_v6.4.patch, 
> JobScheduler_v6.patch, JobScheduler_v7.1.patch, JobScheduler_v7.patch, 
> JobScheduler_v8.patch, RackAwareJobScheduler.java, 
> SimpleResourceAwareJobScheduler.java
>
>
> First I would like warn you that my proposition is assumed to be very naive. 
> I just hope that reading it won't make you lose time.
> h4. The aim
> It seems to me that improving Hadoop scheduling could be very profitable. 
> But, it is hard to implement and compare schedulers, because the scheduling 
> logic is mixed within the rest of the JobTracker.
> This bug is the first step of an attempt to improve the Hadoop scheduler. It 
> re-implements the current scheduling algorithm in a separate class called 
> JobScheduler. This new class is instantiated in the JobTracker.
> h4. Bug fixed as a side effects
> This patch probably cannot be submited as it is.
> A first difficulty is that it does not have exactly the same behaviour than 
> the current JobTracker. More precisely, it doesn't re-implement things like 
> code that seems to be never called or concurency problems.
> I wrote TOCONFIRM where my proposition differ from the current 
> implementation, so you can find them easily.
> I know that fixing bugs silently is bad. So, independently of what you decide 
> about this patch, I will open issues for bugs that you confirm.
> h4. Other side effects
> Another side effect of this patch is to add documentation about each step of 
> the scheduling. I hope that it will help future improvement by lowering the 
> level required to contribute to the scheduler.
> It also reduces the complexity and the granularity of the JobTracker (making 
> it more parallel).
> h4. The future
> If you feel that this is a step the right direction, I will try to propose a 
> JobSchedulerInterface that many JobSchedulers could implement and to propose 
> alternatives to the current « FifoJobScheduler ».  If some of you have ideas 
> about that please tell ^^ I will also open issues for things marked as FIXME 
> in the patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to