Re: Appropriate use of Hadoop for non-map/reduce tasks?

John Heidemann Fri, 21 Dec 2007 12:41:56 -0800

On Thu, 20 Dec 2007 18:46:58 PST, Kirk True wrote: 
>Hi all,
>
>A lot of the ideas I have for incorporating Hadoop into internal projects 
>revolves around distributing long-running tasks over multiple machines. I've 
>been able to get a quick prototype up in Hadoop for one of those projects and 
>it seems to work pretty well. 
>
>However, in this project and the others, I'm not processing a lot of text or 
>mapping or reducing anything. I'm basically asynchronously processing a lot of 
>work over many machines in a master/worker paradigm rather than map/reduce.
>
>I have shown that I can achieve what I'm looking for with Hadoop. I just can't 
>get over the "feeling" that I'm shoe-horning it into a use it wasn't really 
>meant to do.
>
>We've done a similar project with Gigaspaces, but Hadoop seems to alleviate a 
>lot of the burden of what we're doing moving forward.


Ted responds:
}Map-reduce is just one way of organizing your computation.  If you have
}something simpler, then I would say that you are doing fine.
}
}There are plenty of tasks that are best served by a DAG of simple tasks.
}Systems like Amazon's simple queue (where tasks come back to life if they
}aren't "finished") provide a very natural control flow for these problems.
}
}You can phrase many of them as map-reduce programs with trivial reducers,
}but there really isn't much point in that if your inputs are large.
}
}One thing that you may be missing out on is task placement.  It would be
}nice to invoke code close to the inputs to avoid some network traffic.  That
}would be pretty easy to do, I would guess.

Ted, your reply is good, but I think it misses the point of Krik's question.

He's not saying "is Hadoop optimal" for things that aren't really
map/reduce, but "is it reasonable" for those things?
(Kirk, is that right?)

I think that's a good question, because (speaking for myself), the
overhead of maintaining a cluster compute system has always been high.
Often higher than the benefit.  So if I can just maintain Hadoop and use
it in two different ways, I win big over maintaining Hadoop and some
other master/worker system.

(That said, maybe the bar has moved.  My read of Google's papers is that
they run multiple systems, but then again they have some resources to
invest.  Or maybe other cluster compute systems are now easier to deploy
and maintain, and optimize, and control interactions with Hadoop,
and...  But I'd guess not :-) )

   -John Heidemann

Re: Appropriate use of Hadoop for non-map/reduce tasks?

Reply via email to