Re: Combining MapReduce implementations

Doug Cutting Wed, 11 Oct 2006 09:33:04 -0700

Trevor Strohman wrote:

Grid Engine: All the machines available to me run Sun's Grid Engine forjob submission. Grid Engine is important for us, because it makes surethat all of the users of a cluster get their fair share of resources--asfar as I can tell, the JobTracker assumes that one user owns themachines. Is this shared scenario you're interested in supporting?

Yes. We'd like Hadoop's MapReduce to be able to live on top of suchsystems. Some are already experimenting with Hadoop on Condor, but I'venot yet heard of anyone using Hadoop on Sun's Grid engine.


http://issues.apache.org/jira/browse/HADOOP-428
http://www.cs.wisc.edu/condor/CondorWeek2006/presentations/paranjpye_yahoo_condor.ppt

Would you consider supporting job submission systems like Grid Engine orCondor?

Definitely. I'm also interested in supporting Amazon's EC2, since itremoves the need of purchasing and maintaining a cluster. Inparticular, Amazon's prices seem, for many applications, to beconsiderably cheaper than operating one's own cluster.

Record I/O: [ ...]
and my TypeBuilder class generates code for all possible orderings ofthis class (order by word, order by count, order by word then count,order by count then word). Each ordering has its own hash function andcomparator.
In addition, each ordering has its own serialization/deserializationcode. For example, if we order by count, the serialization code storesonly differences between adjacent counts to help with compression.
Is this code you'd be interested in?

Yes, this sounds very interesting. Does it build on the Record IOclasses or is it completely separate?


Thanks,

Doug

Re: Combining MapReduce implementations

Reply via email to