On Oct 11, 2006, at 12:32 PM, Doug Cutting wrote:

Yes. We'd like Hadoop's MapReduce to be able to live on top of such systems [Grid Engine]. Some are already experimenting with Hadoop on Condor, but I've not yet heard of anyone using Hadoop on Sun's Grid engine.

http://issues.apache.org/jira/browse/HADOOP-428
http://www.cs.wisc.edu/condor/CondorWeek2006/presentations/ paranjpye_yahoo_condor.ppt

[...]  I'm also interested in supporting Amazon's EC2 [...]

That's good to hear. For our own hardware, the critical issue is how we can share the resources efficiently with other people. Right now there are lots of people using these machines, and I'm the only one using MapReduce. Some people want to use MPI, some want to run standard applications that use NFS, etc. Grid Engine almost completely solves this sharing problem for us.

EC2 support sounds exciting.

Record I/O: [ ...]
and my TypeBuilder class generates code for all possible orderings of this class (order by word, order by count, order by word then count, order by count then word). Each ordering has its own hash function and comparator. In addition, each ordering has its own serialization/ deserialization code. For example, if we order by count, the serialization code stores only differences between adjacent counts to help with compression.
Is this code you'd be interested in?

Yes, this sounds very interesting. Does it build on the Record IO classes or is it completely separate?

I'm afraid it's completely separate, although it's not much code. The TypeBuilder is ~600 lines of code right now, plus maybe 500 lines of additional support (compression classes, etc.).

It can't be considered a drop-in replacement for the record stuff-- you've already got C++ support and complex record types. I don't know if it even makes sense to try to integrate the code I have, or if it should just serve as a proof of concept for a feature.

Trevor



Reply via email to