[ 
https://issues.apache.org/jira/browse/HADOOP-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12613241#action_12613241
 ] 

Vivek Ratan commented on HADOOP-3412:
-------------------------------------

[EMAIL PROTECTED]:*
bq. I'm still not sure I understand why jobAdded and jobRemoved should not be 
in the TaskScheduler. It's true that persistence of jobs should be managed by 
the JobQueueManager, but these methods are meant to be "listeners" [...]

After some thought, I'm unable to convincingly argue, even to myself, for the 
removal of the methods from _TaskScheduler_. 

My concern really was with state. On one hand, I see the Scheduler as a 
stateless algorithm. The information it needs about jobs, when it runs, it gets 
from some other class. I was worried about any class that extends 
_TaskScheduler_ having to maintain its own data structures for jobs, while a 
class like _JobQueueManager_ is also maintaining (similar?) structures. On the 
other hand, I see your point too - for efficiency, a scheduler may want to know 
about what's changed since it ran last, rather than look at the entire set of 
jobs each time. A scheduler can certainly cache what information it needs (and 
maybe even support listener methods as you've suggested) if performance becomes 
an issue, but there is a state that it imposes on the system - it orders jobs a 
certain way (one scheduler may order jobs in FIFO order, another may choose a 
different ordering) - and perhaps this state is inherent to the scheduler. 

Like I said, I can't see a very strong reason for removing the job methods from 
_TaskScheduler_, so let's leave them there. 

> Refactor the scheduler out of the JobTracker
> --------------------------------------------
>
>                 Key: HADOOP-3412
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3412
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Brice Arnould
>            Assignee: Brice Arnould
>            Priority: Minor
>             Fix For: 0.19.0
>
>         Attachments: JobScheduler-v9.1.patch, JobScheduler-v9.2.patch, 
> JobScheduler-v9.patch, JobScheduler.patch, JobScheduler_v2.patch, 
> JobScheduler_v3.patch, JobScheduler_v3b.patch, JobScheduler_v4.patch, 
> JobScheduler_v5.patch, JobScheduler_v6.1.patch, JobScheduler_v6.2.patch, 
> JobScheduler_v6.3.patch, JobScheduler_v6.4.patch, JobScheduler_v6.patch, 
> JobScheduler_v7.1.patch, JobScheduler_v7.patch, JobScheduler_v8.patch, 
> RackAwareJobScheduler.java, SimpleResourceAwareJobScheduler.java
>
>
> First I would like warn you that my proposition is assumed to be very naive. 
> I just hope that reading it won't make you lose time.
> h4. The aim
> It seems to me that improving Hadoop scheduling could be very profitable. 
> But, it is hard to implement and compare schedulers, because the scheduling 
> logic is mixed within the rest of the JobTracker.
> This bug is the first step of an attempt to improve the Hadoop scheduler. It 
> re-implements the current scheduling algorithm in a separate class called 
> JobScheduler. This new class is instantiated in the JobTracker.
> h4. Bug fixed as a side effects
> This patch probably cannot be submited as it is.
> A first difficulty is that it does not have exactly the same behaviour than 
> the current JobTracker. More precisely, it doesn't re-implement things like 
> code that seems to be never called or concurency problems.
> I wrote TOCONFIRM where my proposition differ from the current 
> implementation, so you can find them easily.
> I know that fixing bugs silently is bad. So, independently of what you decide 
> about this patch, I will open issues for bugs that you confirm.
> h4. Other side effects
> Another side effect of this patch is to add documentation about each step of 
> the scheduling. I hope that it will help future improvement by lowering the 
> level required to contribute to the scheduler.
> It also reduces the complexity and the granularity of the JobTracker (making 
> it more parallel).
> h4. The future
> If you feel that this is a step the right direction, I will try to propose a 
> JobSchedulerInterface that many JobSchedulers could implement and to propose 
> alternatives to the current « FifoJobScheduler ».  If some of you have ideas 
> about that please tell ^^ I will also open issues for things marked as FIXME 
> in the patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to