Combining MapReduce implementations

Trevor Strohman Wed, 11 Oct 2006 07:00:46 -0700

Hi all,

I just started using the Hadoop DFS last night, and it has alreadysolved a big performance problem we were having with throughput fromour shared NFS storage. Thanks for everyone who has contributed tothat project.

I wrote my own MapReduce implementation, because I needed twofeatures that Hadoop didn't have: Grid Engine integration and easyrecord I/O (described below). I'm writing this message to see ifyou're interested in these ideas for Hadoop, and to see what ideas Imight learn from you.

Grid Engine: All the machines available to me run Sun's Grid Enginefor job submission. Grid Engine is important for us, because itmakes sure that all of the users of a cluster get their fair share ofresources--as far as I can tell, the JobTracker assumes that one userowns the machines. Is this shared scenario you're interested insupporting? Would you consider supporting job submission systemslike Grid Engine or Condor?

Record I/O: My implementation is something likeorg.apache.hadoop.record implementation, but with a couple oftwists. In my implementation, you give the system a simple Javaclass, like this:


public class WordCount {
        public String word;
        public long count;
}

and my TypeBuilder class generates code for all possible orderings ofthis class (order by word, order by count, order by word then count,order by count then word). Each ordering has its own hash functionand comparator.

In addition, each ordering has its own serialization/deserializationcode. For example, if we order by count, the serialization codestores only differences between adjacent counts to help withcompression.

All this code is grouped into an Order object, which is accessed likethis:

        String[] fields = { "word" };
        Order<WordCount> order = (new WordCountType()).getOrder( fields );

This order object contains a hash function, a comparator, andserialization logic for ordering WordCount objects by word.


Is this code you'd be interested in?

Thanks,

Trevor

(by the way, Doug, you may remember me from a panel at the OSIRworkshop this year on open source search)

Combining MapReduce implementations

Reply via email to