[jira] Updated: (HADOOP-3412) Refactor the scheduler out of the JobTracker

Brice Arnould (JIRA) Mon, 30 Jun 2008 06:22:39 -0700

     [ 
https://issues.apache.org/jira/browse/HADOOP-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Brice Arnould updated HADOOP-3412:
----------------------------------

    Attachment: JobScheduler_v7.1.patch

*Tom White*
bq. Unless we're absolutely sure that we've got the interface right, I think 
JobQueue and TaskScheduler should be abstract classes. See 
https://issues.apache.org/jira/browse/HADOOP-1230?focusedCommentId=12573958#action_12573958
You mean that interfaces are easier to make evolve because they can provide 
default implementations to the methods we will add ?
I didn't thought of that. I made the change.

bq. We don't need iterator() and getSortedJobs() - iterator() is sufficient.
getSortedJobs() allows to bias the choice of the job by the characteristics of 
the TaskTracker, something that appeared to be useful when I played with the 
API. This new proposition however provides a default implementation for it.

bq. What did you think of the idea of having JobLimitedTaskScheduler?
Not a problem to add it, but I see two way of doing so without duplicating most 
of assignTask() :
 * By composition, adding a JobFilter subclass with two methods : isAcceptable 
(job, step) and getNumberOfSteps(). The first would tell if a job is right for 
the step we're in and the second the number of steps we need.
 * By inheritance, providing isAcceptable and getNumberOfSteps as methods of 
the DefaultTaskScheduler.

Both are easy to implement but that new level of abstraction seems contrary to 
the KISS principle, except if we really need it for other filters. For now, 
when limits are disabled, the TaskScheduler just do one more test line 104, and 
one other line 124 when limits are enabled. That might not justify the creation 
of another class and a filter concept (again : excepted if we need them for 
something else).

I fixed the warnings. Thanks for your advices. It's instructive ^^

> Refactor the scheduler out of the JobTracker
> --------------------------------------------
>
>                 Key: HADOOP-3412
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3412
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Brice Arnould
>            Assignee: Brice Arnould
>            Priority: Minor
>             Fix For: 0.19.0
>
>         Attachments: JobScheduler.patch, JobScheduler_v2.patch, 
> JobScheduler_v3.patch, JobScheduler_v3b.patch, JobScheduler_v4.patch, 
> JobScheduler_v5.patch, JobScheduler_v6.1.patch, JobScheduler_v6.2.patch, 
> JobScheduler_v6.3.patch, JobScheduler_v6.4.patch, JobScheduler_v6.patch, 
> JobScheduler_v7.1.patch, JobScheduler_v7.patch, RackAwareJobScheduler.java, 
> SimpleResourceAwareJobScheduler.java
>
>
> First I would like warn you that my proposition is assumed to be very naive. 
> I just hope that reading it won't make you lose time.
> h4. The aim
> It seems to me that improving Hadoop scheduling could be very profitable. 
> But, it is hard to implement and compare schedulers, because the scheduling 
> logic is mixed within the rest of the JobTracker.
> This bug is the first step of an attempt to improve the Hadoop scheduler. It 
> re-implements the current scheduling algorithm in a separate class called 
> JobScheduler. This new class is instantiated in the JobTracker.
> h4. Bug fixed as a side effects
> This patch probably cannot be submited as it is.
> A first difficulty is that it does not have exactly the same behaviour than 
> the current JobTracker. More precisely, it doesn't re-implement things like 
> code that seems to be never called or concurency problems.
> I wrote TOCONFIRM where my proposition differ from the current 
> implementation, so you can find them easily.
> I know that fixing bugs silently is bad. So, independently of what you decide 
> about this patch, I will open issues for bugs that you confirm.
> h4. Other side effects
> Another side effect of this patch is to add documentation about each step of 
> the scheduling. I hope that it will help future improvement by lowering the 
> level required to contribute to the scheduler.
> It also reduces the complexity and the granularity of the JobTracker (making 
> it more parallel).
> h4. The future
> If you feel that this is a step the right direction, I will try to propose a 
> JobSchedulerInterface that many JobSchedulers could implement and to propose 
> alternatives to the current « FifoJobScheduler ».  If some of you have ideas 
> about that please tell ^^ I will also open issues for things marked as FIXME 
> in the patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3412) Refactor the scheduler out of the JobTracker

Reply via email to