On Oct 11, 2006, at 12:32 PM, Doug Cutting wrote:
Yes. We'd like Hadoop's MapReduce to be able to live on top of
such systems [Grid Engine]. Some are already experimenting with
Hadoop on Condor, but I've not yet heard of anyone using Hadoop on
Sun's Grid engine.
http://issues.apache.org/jira/browse/HADOOP-428
http://www.cs.wisc.edu/condor/CondorWeek2006/presentations/
paranjpye_yahoo_condor.ppt
[...] I'm also interested in supporting Amazon's EC2 [...]
That's good to hear. For our own hardware, the critical issue is how
we can share the resources efficiently with other people. Right now
there are lots of people using these machines, and I'm the only one
using MapReduce. Some people want to use MPI, some want to run
standard applications that use NFS, etc. Grid Engine almost
completely solves this sharing problem for us.
EC2 support sounds exciting.
Record I/O: [ ...]
and my TypeBuilder class generates code for all possible orderings
of this class (order by word, order by count, order by word then
count, order by count then word). Each ordering has its own hash
function and comparator.
In addition, each ordering has its own serialization/
deserialization code. For example, if we order by count, the
serialization code stores only differences between adjacent counts
to help with compression.
Is this code you'd be interested in?
Yes, this sounds very interesting. Does it build on the Record IO
classes or is it completely separate?
I'm afraid it's completely separate, although it's not much code.
The TypeBuilder is ~600 lines of code right now, plus maybe 500 lines
of additional support (compression classes, etc.).
It can't be considered a drop-in replacement for the record stuff--
you've already got C++ support and complex record types. I don't
know if it even makes sense to try to integrate the code I have, or
if it should just serve as a proof of concept for a feature.
Trevor