The usage scenario I imagine is that one allocates all the hosts in a
cluster, runs some mapreduce computations, then de-allocates all hosts.
I didn't think one would normally re-size the cluster in the middle of
a computation. Why would you want to do that?
Doug
Lee wrote:
I was also contemplating EC2 in regards to Hadoop. One of the issues I was
thinking of was, assuming you are dynamically allocating and deallocating
hosts, would you need to be careful of how fast you released hosts? Is
there currently any graceful way of letting Hadoop deal with removing a
host? I.E letting any mapreduce tasks finish and moving chunks to another
box.
Lee
On 10/11/06, Doug Cutting <[EMAIL PROTECTED]> wrote:
Trevor Strohman wrote:
> Grid Engine: All the machines available to me run Sun's Grid Engine for
> job submission. Grid Engine is important for us, because it makes sure
> that all of the users of a cluster get their fair share of
resources--as
> far as I can tell, the JobTracker assumes that one user owns the
> machines. Is this shared scenario you're interested in supporting?
Yes. We'd like Hadoop's MapReduce to be able to live on top of such
systems. Some are already experimenting with Hadoop on Condor, but I've
not yet heard of anyone using Hadoop on Sun's Grid engine.
http://issues.apache.org/jira/browse/HADOOP-428
http://www.cs.wisc.edu/condor/CondorWeek2006/presentations/paranjpye_yahoo_condor.ppt
> Would you consider supporting job submission systems like Grid
Engine or
> Condor?
Definitely. I'm also interested in supporting Amazon's EC2, since it
removes the need of purchasing and maintaining a cluster. In
particular, Amazon's prices seem, for many applications, to be
considerably cheaper than operating one's own cluster.
> Record I/O: [ ...]
> and my TypeBuilder class generates code for all possible orderings of
> this class (order by word, order by count, order by word then count,
> order by count then word). Each ordering has its own hash function and
> comparator.
>
> In addition, each ordering has its own serialization/deserialization
> code. For example, if we order by count, the serialization code stores
> only differences between adjacent counts to help with compression.
>
> Is this code you'd be interested in?
Yes, this sounds very interesting. Does it build on the Record IO
classes or is it completely separate?
Thanks,
Doug