+1 to Jeff's suggestions, especially on locality. I'd love to see some rigorous 
work done so that the scheduler could prefer distributing tasks to the nodes 
that are already hosting the appropriate data. Generalizing this further so 
that a full vertical integration of HDFS, Hbase, and Map/Reduce could exploit 
maximal data locality would be even cooler.

Chad

On 2/24/08 2:56 PM, "Jeff Hammerbacher" <[EMAIL PROTECTED]> wrote:

Hey Jaideep,

One interesting direction for research would be more sophisticated
scheduling policies for the JobTracker to help improve locality and overall
cluster utilization.  The introduction of speculative execution is a step in
this direction; you could perhaps investigate the implications of different
speculative execution policies on different job types.

Regards,
Jeff

On Sun, Feb 24, 2008 at 9:41 AM, Jaideep Dhok <[EMAIL PROTECTED]>
wrote:

> Hello,
> I am a graduate research student  in CS at the Search and Information
> Extraction Lab, in IIIT Hyderabad, India (http://search.iiit.ac.in). I
> have
> been working on Nutch and Hadoop for the past couple of months, basically
> to
> get an understanding of the platform, and to discover possible research
> areas for my thesis work. Most of the time I have been playing with the
> Hadoop code base, and by now I am pretty much familiar with the internals
> (especially the Map-Reduce part).
>
> I have been reading publications related to Map-Reduce and the Google file
> system etc, and I am still looking for interesing research topics. I was
> wondering if anyone would like to share/suggest any ideas related to the
> Hadoop plaform. Any suggestions and comments are greatly appreciated.
>
> Thanks and Regards,
> Jaideep Dhok,
>


Reply via email to