[jira] Commented: (HADOOP-3412) Refactor the scheduler out of the JobTracker

Vivek Ratan (JIRA) Wed, 16 Jul 2008 22:17:56 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12614224#action_12614224
 ]


Vivek Ratan commented on HADOOP-3412:
-------------------------------------

bq. For this issue, which is about moving scheduling logic from the JobTracker 
to a scheduler class, I think we can leave out queues. We don't currently have 
the explicit concept of a queue, so I think it makes sense to commit this 
change, and continue the discussion about adding queues in HADOOP-3445. As 
discussed earlier, this Jira will not change the public APIs yet, so we can go 
on evolving the scheduling interface.

It's fine to move this discussion elsewhere, but I think it should be in a new 
Jira. HADOOP-3445 is specifically about implementing part of HADOOP-3421 and 
comments there should reflect on the implementation (the algorithms for 
capacity redistribution, the algos for handling capacities and user limits, 
etc). This separate Jira should be about designing the Scheduler interface, 
given queues and perhaps some other new artifacts. 

bq. Would the taskUpdated method be called by JobTracker#updateTaskStatuses? I 
can see that it might be useful for schedulers to have this information, but 
perhaps this is something to add to the interface when a use case comes up? 
(TaskScheduler is an abstract class, so it's easy to add new methods to it.)

Just like a scheduler is a listener to jobs being added/deleted, and their 
states being modified, it should also be a listener to task states being 
modified. The 3445 scheduler needs this. For example, it keeps track of how 
many tasks of a user are running (to handle user limits). So it needs to know 
when a task starts running or when it completes. It can compute this by 
iterating through all jobs, but being a listener to a task status change is 
convenient. I'm not sure where exactly TaskScheduler.updateTask() will be 
called, but task status changes and job status changes both seem to be needed. 

> Refactor the scheduler out of the JobTracker
> --------------------------------------------
>
>                 Key: HADOOP-3412
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3412
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Brice Arnould
>            Assignee: Brice Arnould
>            Priority: Minor
>             Fix For: 0.19.0
>
>         Attachments: JobScheduler-v10.patch, JobScheduler-v11.patch, 
> JobScheduler-v9.1.patch, JobScheduler-v9.2.patch, JobScheduler-v9.patch, 
> JobScheduler.patch, JobScheduler_v2.patch, JobScheduler_v3.patch, 
> JobScheduler_v3b.patch, JobScheduler_v4.patch, JobScheduler_v5.patch, 
> JobScheduler_v6.1.patch, JobScheduler_v6.2.patch, JobScheduler_v6.3.patch, 
> JobScheduler_v6.4.patch, JobScheduler_v6.patch, JobScheduler_v7.1.patch, 
> JobScheduler_v7.patch, JobScheduler_v8.patch, RackAwareJobScheduler.java, 
> SimpleResourceAwareJobScheduler.java
>
>
> First I would like warn you that my proposition is assumed to be very naive. 
> I just hope that reading it won't make you lose time.
> h4. The aim
> It seems to me that improving Hadoop scheduling could be very profitable. 
> But, it is hard to implement and compare schedulers, because the scheduling 
> logic is mixed within the rest of the JobTracker.
> This bug is the first step of an attempt to improve the Hadoop scheduler. It 
> re-implements the current scheduling algorithm in a separate class called 
> JobScheduler. This new class is instantiated in the JobTracker.
> h4. Bug fixed as a side effects
> This patch probably cannot be submited as it is.
> A first difficulty is that it does not have exactly the same behaviour than 
> the current JobTracker. More precisely, it doesn't re-implement things like 
> code that seems to be never called or concurency problems.
> I wrote TOCONFIRM where my proposition differ from the current 
> implementation, so you can find them easily.
> I know that fixing bugs silently is bad. So, independently of what you decide 
> about this patch, I will open issues for bugs that you confirm.
> h4. Other side effects
> Another side effect of this patch is to add documentation about each step of 
> the scheduling. I hope that it will help future improvement by lowering the 
> level required to contribute to the scheduler.
> It also reduces the complexity and the granularity of the JobTracker (making 
> it more parallel).
> h4. The future
> If you feel that this is a step the right direction, I will try to propose a 
> JobSchedulerInterface that many JobSchedulers could implement and to propose 
> alternatives to the current « FifoJobScheduler ».  If some of you have ideas 
> about that please tell ^^ I will also open issues for things marked as FIXME 
> in the patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3412) Refactor the scheduler out of the JobTracker

Reply via email to