[jira] Commented: (HADOOP-3412) Refactor the scheduler out of the JobTracker

Vivek Ratan (JIRA) Fri, 11 Jul 2008 02:30:56 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12612816#action_12612816
 ]


Vivek Ratan commented on HADOOP-3412:
-------------------------------------

@Matei:
>> Vivek, regarding job persistence - I think the addJob and removeJob methods 
>> in the TaskScheduler are only meant to be "listener" methods to notify it 
>> that a job should be considered for scheduling. The JobTracker still keeps a 
>> list of jobs in the jobs variable, so it is the ultimate "owner" of the job 
>> list. Thus it should be possible to persist the jobs in the JobTracker or 
>> JobQueueManager or some other class and just add/remove them from the 
>> scheduler when they become schedulable.

My point was, the Scheduler should get whatever jobs it needs to consider from 
someone who manages jobs. It shouldn't maintain a separate list. Suppose you 
have one Scheduler that wants to look at all submitted jobs before deciding 
which task is best. Suppose you have another that only wants to look at the job 
with the highest priority and pick a task from it. In the first case, the 
caller (JT) needs to invoke the Scheduler's _addJob()_ method for every job. In 
the second case, it needs to invoke the Scheduler's _addJob()_ method only for 
one job. This is not good. The caller's code should be the same regardless of 
which scheduler is used behind the scenes. What should really happen is that 
when the first scheduler is called, it looks at all the jobs by fetching them 
from a _JobManager_ or whatever class it is that handles jobs. The second 
scheduler will call the _Jobmanager_ in a different way. The scheduler user's 
code is not affected. You shouldn't tell the Scheduler what jobs to consider - 
that decision is part of the Scheduler's internals. 




> Refactor the scheduler out of the JobTracker
> --------------------------------------------
>
>                 Key: HADOOP-3412
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3412
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Brice Arnould
>            Assignee: Brice Arnould
>            Priority: Minor
>             Fix For: 0.19.0
>
>         Attachments: JobScheduler-v9.patch, JobScheduler.patch, 
> JobScheduler_v2.patch, JobScheduler_v3.patch, JobScheduler_v3b.patch, 
> JobScheduler_v4.patch, JobScheduler_v5.patch, JobScheduler_v6.1.patch, 
> JobScheduler_v6.2.patch, JobScheduler_v6.3.patch, JobScheduler_v6.4.patch, 
> JobScheduler_v6.patch, JobScheduler_v7.1.patch, JobScheduler_v7.patch, 
> JobScheduler_v8.patch, RackAwareJobScheduler.java, 
> SimpleResourceAwareJobScheduler.java
>
>
> First I would like warn you that my proposition is assumed to be very naive. 
> I just hope that reading it won't make you lose time.
> h4. The aim
> It seems to me that improving Hadoop scheduling could be very profitable. 
> But, it is hard to implement and compare schedulers, because the scheduling 
> logic is mixed within the rest of the JobTracker.
> This bug is the first step of an attempt to improve the Hadoop scheduler. It 
> re-implements the current scheduling algorithm in a separate class called 
> JobScheduler. This new class is instantiated in the JobTracker.
> h4. Bug fixed as a side effects
> This patch probably cannot be submited as it is.
> A first difficulty is that it does not have exactly the same behaviour than 
> the current JobTracker. More precisely, it doesn't re-implement things like 
> code that seems to be never called or concurency problems.
> I wrote TOCONFIRM where my proposition differ from the current 
> implementation, so you can find them easily.
> I know that fixing bugs silently is bad. So, independently of what you decide 
> about this patch, I will open issues for bugs that you confirm.
> h4. Other side effects
> Another side effect of this patch is to add documentation about each step of 
> the scheduling. I hope that it will help future improvement by lowering the 
> level required to contribute to the scheduler.
> It also reduces the complexity and the granularity of the JobTracker (making 
> it more parallel).
> h4. The future
> If you feel that this is a step the right direction, I will try to propose a 
> JobSchedulerInterface that many JobSchedulers could implement and to propose 
> alternatives to the current « FifoJobScheduler ».  If some of you have ideas 
> about that please tell ^^ I will also open issues for things marked as FIXME 
> in the patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3412) Refactor the scheduler out of the JobTracker

Reply via email to