Re: Combining MapReduce implementations

Trevor Strohman Wed, 11 Oct 2006 10:43:02 -0700


On Oct 11, 2006, at 12:32 PM, Doug Cutting wrote:

Yes. We'd like Hadoop's MapReduce to be able to live on top ofsuch systems [Grid Engine]. Some are already experimenting withHadoop on Condor, but I've not yet heard of anyone using Hadoop onSun's Grid engine.
http://issues.apache.org/jira/browse/HADOOP-428
http://www.cs.wisc.edu/condor/CondorWeek2006/presentations/paranjpye_yahoo_condor.ppt
[...]  I'm also interested in supporting Amazon's EC2 [...]

That's good to hear. For our own hardware, the critical issue is howwe can share the resources efficiently with other people. Right nowthere are lots of people using these machines, and I'm the only oneusing MapReduce. Some people want to use MPI, some want to runstandard applications that use NFS, etc. Grid Engine almostcompletely solves this sharing problem for us.


EC2 support sounds exciting.

Record I/O: [ ...]
and my TypeBuilder class generates code for all possible orderingsof this class (order by word, order by count, order by word thencount, order by count then word). Each ordering has its own hashfunction and comparator.In addition, each ordering has its own serialization/deserialization code. For example, if we order by count, theserialization code stores only differences between adjacent countsto help with compression.
Is this code you'd be interested in?
Yes, this sounds very interesting. Does it build on the Record IOclasses or is it completely separate?

I'm afraid it's completely separate, although it's not much code.The TypeBuilder is ~600 lines of code right now, plus maybe 500 linesof additional support (compression classes, etc.).

It can't be considered a drop-in replacement for the record stuff--you've already got C++ support and complex record types. I don'tknow if it even makes sense to try to integrate the code I have, orif it should just serve as a proof of concept for a feature.


Trevor

Re: Combining MapReduce implementations

Reply via email to