Refactor the scheduler out of the JobTracker
--------------------------------------------
Key: HADOOP-3412
URL: https://issues.apache.org/jira/browse/HADOOP-3412
Project: Hadoop Core
Issue Type: Improvement
Components: mapred
Reporter: Brice Arnould
Priority: Minor
Attachments: JobScheduler.patch
First I would like warn you that my proposition is assumed to be very naive. I
just hope that reading it won't make you lose time.
h4. The aim
It seems to me that improving Hadoop scheduling could be very profitable. But,
it is hard to implement and compare schedulers, because the scheduling logic is
mixed within the rest of the JobTracker.
This bug is the first step of an attempt to improve the Hadoop scheduler. It
re-implements the current scheduling algorithm in a separate class called
JobScheduler. This new class is instantiated in the JobTracker.
h4. Bug fixed as a side effects
This patch probably cannot be submited as it is.
A first difficulty is that it does not have exactly the same behaviour than the
current JobTracker. More precisely, it doesn't re-implement things like code
that seems to be never called or concurency problems.
I wrote TOCONFIRM where my proposition differ from the current implementation,
so you can find them easily.
I know that fixing bugs silently is bad. So, independently of what you decide
about this patch, I will open issues for bugs that you confirm.
h4. Other side effects
Another side effect of this patch is to add documentation about each step of
the scheduling. I hope that it will help future improvement by lowering the
level required to contribute to the scheduler.
It also reduces the complexity and the granularity of the JobTracker (making it
more parallel).
h4. The future
If you feel that this is a step the right direction, I will try to propose a
JobSchedulerInterface that many JobSchedulers could implement and to propose
alternatives to the current « FifoJobScheduler ». If some of you have ideas
about that please tell ^^ I will also open issues for things marked as FIXME in
the patch.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.