Ahhh.... My previous comments assumed that "long-lived" meant jobs that run
for days and days and days (essentially forever).

15 minute jobs with a finite work-list is actually a pretty good match for
map-reduce as implemented by Hadoop.


On 12/25/07 10:04 AM, "Kirk True" <[EMAIL PROTECTED]> wrote:

> Hi all,
> 
> Thanks for all the replies thus far...
> 
> Joydeep Sen Sarma <[EMAIL PROTECTED]> wrote: in many cases - long running
> tasks are of low cpu util. i have trouble imagining how these can mix well
> with cpu intensive short/batch tasks. afaik - hadoop's job scheduling is not
> resource usage aware. long background tasks would consume per-machine task
> slots that would block out other tasks from using available cpu bandwidth.
> 
> Maybe I should clarify things...
> 
> The jobs that we're presently trying to use Hadoop for are fairly long-lived
> (i.e. ~15 minutes) but -- to Chad's point -- they are finite.
> 
> That said, the long-livedness of the individual jobs is a "temporary" thing in
> that we'll be making each job do less work such that they are ~1 minute.
> 
> To John's point, yes, the question is not 'is this optimal?' but it is it
> reasonable to use a framework geared for map/reduce operations to simply
> distributed jobs over multiple machines.
> 
> I've looked at a couple of other solutions for generic master/worker type of
> functionality, but we'd like to stick to an open source implementation.
> 
> Like I said before, I can get Hadoop to do what I need. But that doesn't make
> it "right" ;)
> 
> Thanks,
> Kirk
> 
> -----Original Message-----
> From: Chad Walters [mailto:[EMAIL PROTECTED]
> Sent: Sat 12/22/2007 2:39 PM
> To: [email protected]
> Subject: Re: Appropriate use of Hadoop for non-map/reduce tasks?
>  
> 
> I should further say that god functions only on a per machine basis. We have
> then built a number of scripts that do auto-configuration of our various
> services, using configs pulled from LDAP and code pulled from our package
> repo. We use this to configure our various server processes and also to
> configure Hadoop clusters (HDFS and Map/Reduce). But god is a key part of the
> system, since it helps us provide a uniform interface for starting and
> stopping all our services.
> 
> Chad
> 
> 
> On 12/22/07 1:30 PM, "Chad Walters"  wrote:
> 
> I am not really sure that Hadoop is right for what Jeff is describing.
> 
> I think there may be two separate problems:
> 
>  1.  Batch tasks that may take a long time but are expected to have a finite
> termination
>  2.  Long-lived server processes that have an indefinite lifetime
> 
> For #1, we pretty much use Hadoop, although we have built a fairly extensive
> framework inside of these long map tasks to track progress and handle various
> failure conditions that can arise. If people are really interested, I'll poke
> around and see if any of it is general enough to warrant contributing back,
> but I think a lot of it is probably fairly specific to the kinds of failure
> cases we expect from the components involved in the long map task.
> 
> For #2, we are using something called "god" (http://god.rubyforge.org/). One
> of our developers ended up starting this project because he didn't like monit.
> We liked the way it was going and now we now we use it throughout our
> datacenter to start, stop, and health check our server processes. It supports
> both polling and event-driven actions and is pretty extensible. Check it out
> to see if it might satisfy some of your needs.
> 
> Chad
> 
> 
> On 12/22/07 11:40 AM, "Jeff Hammerbacher"  wrote:
> 
> yo,
> from my understanding, the map/reduce codebase grew out of the codebase for
> "the borg", google's system for managing long-running processes.  we could
> definitely use this sort of functionality, and the jobtracker/tasktracker
> paradigm goes part of the way there.  sqs really helps when you want to run
> a set of recurring, dependent processes (a problem our group definitely
> needs to solve), but it doesn't really seem to address the issue of managing
> those processes when they're long-lived.
> 
> for instance, when we deploy our search servers, we have a script that
> basically says "daemonize this process on this many boxes, and if it enters
> a condition that doesn't look healthy, take this action (like restart, or
> rebuild the index, etc.)".  given how hard-coded the task-type is into
> map/reduce (er, "map" and "reduce"), it's hard to specify new types of error
> conditions and running conditions for your processes.  also, the jobtracker
> doesn't have any high availability guarantees, so you could run into a
> situation where your processes are fine but the jobtracker goes down.
>  zookeeper could help here.  it'd be sweet if hadoop could handle this
> long-lived process management scenario.
> 
> kirk, i'd be interested in hearing more about your processes and the
> requirements you have of your process manager.  we're exploring other
> solutions to this problem and i'd be happy to connect you with the folks
> here who are thinking about the issue.
> 
> later,
> jeff
> 
> On Dec 21, 2007 12:42 PM, John Heidemann  wrote:
> 
>> On Fri, 21 Dec 2007 12:24:57 PST, John Heidemann wrote:
>>> On Thu, 20 Dec 2007 18:46:58 PST, Kirk True wrote:
>>>> Hi all,
>>>> 
>>>> A lot of the ideas I have for incorporating Hadoop into internal
>> projects revolves around distributing long-running tasks over multiple
>> machines. I've been able to get a quick prototype up in Hadoop for one of
>> those projects and it seems to work pretty well.
>>>> ...
>>> He's not saying "is Hadoop optimal" for things that aren't really
>>> map/reduce, but "is it reasonable" for those things?
>>> (Kirk, is that right?)
>>> ...
>> 
>> Sorry to double reply, but I left out my comment to (my view of) Kirk's
>> question.
>> 
>> In addition to what Ted said, I'm not sure how well Hadoop works with
>> long-running jobs, particuarlly how well that interacts with its fault
>> tolerance code.
>> 
>> And more generally, if you're not doing map/reduce than you'd probably
>> have to build your own fault tolerance methods.
>> 
>>   -John Heidemann
>> 
>> 
> 
> 
> 
> 
> 
> 

Reply via email to