Abhishek,
Welcome.
There isn't a concerted effort afaik, given the complexity of the
task at hand - as I'm sure you will appreciate.
There are several pieces slowly falling in place:
# The CapacityScheduler (CS) already allows for memory-based
scheduling (i.e. support for 'High RAM' jobs). This is a subtle
change, rather than look at abstract 'slots' the CS looks at every
machine as being made up of real memory slots.
# The TaskTracker in trunk (hadoop-0.22) already reports per task
CPU, memory usage.
Clearly it will take some effort to move from here to truly dynamic
slot-less scheduling, since the notion of slots is fairly deeply
entrenched in the framework (JobTracker, TaskTracker etc.).
Of course, I don't mean to discourage you!
Feel free to start opening jiras and jotting your thoughts down,
make some proposals and get involved!
Arun
On Nov 16, 2010, at 4:51 AM, abhishek sharma wrote:
Hi,
In his e-mail on the Hadoop Common mailing list, Steve Loughran
mentioned the following:
"There's work underway to be more aware of system load when scheduling
things, rather than have a fairly simplistic "slot" model, look more
at system load and memory load as a way of measuring how idle machines
are. If you were to be really devious, you'd look at io load, network,
machine temperature, etc. If you find this an interesting problem to
get involved in, the mapreduce-dev mailing list is the place to get
involved."
I would like to get involved.
I recently finished my PhD in computer science from the Univ. of
Southern California. In one of my projects, I modified the MapReduce
scheduler to implement a particular job priority scheme.
Thanks,
Abhishek