Re: Appropriate use of Hadoop for non-map/reduce tasks?

Ted Dunning Fri, 21 Dec 2007 13:06:29 -0800

Great.

Let's start with this:


http://www.amazon.com/Simple-Queue-Service-home-page/b?ie=UTF8&node=13584001

Just the basics.

The way SQS works is:

- you define a "queue" that has a name

- you add "tasks" to the queue.  These are really just small documents that
your workers will understand.

- a worker atomically removes an item from the head of the queue.  This item
will not be completely deleted, but rather will be put in a holding pen for
a period of time after which it will be returned to the queue.

- if the worker finishes work on the item, it deletes the item from the
queue or the holding pen depending on whether the timeout expired.

- if the worker dies before signaling completion of work on the task, the
task will eventually be returned to the queue and handed out to another
worker.

- the worker is responsible for accessing any specified input resources,
saving any results and scheduling any follow-on work.

- there is potential for a race condition when additional work is to be
scheduled.  If the scheduling is done before deleting the item, then there
is a thin possibility that the item would have been passed out again.  If
the scheduling is done after the item is deleted, the worker could crash and
lose the item.  I think the best way to avoid problems is to have workers
check for the existence of a completion flag before starting work and before
saving results.  This makes double processing non-fatal.
  


On 12/21/07 12:51 PM, "Arun C Murthy" <[EMAIL PROTECTED]> wrote:

> On Fri, Dec 21, 2007 at 12:43:38PM -0800, Ted Dunning wrote:
>> 
>> * if you need some kind of work-flow, hadoop won't help (but it won't hurt
>> either)
>> 
> 
> Lets start a discussion around this, seems to be something lots of folks could
> use...

Re: Appropriate use of Hadoop for non-map/reduce tasks?

Reply via email to