We need a generic task queue for our execution framework. Our specific
needs for a task queue includes
1.      The interaction with beanstalk server is done via http requests.
2.      Support tubes.
3.      Hand out tasks based on priority of the tasks
4.      Put the reserved task back to ready queue if the execution time
exceeds the TTR
5.      Allow job scheduler to specify max-retry count for a task. For
example, attempt to execute a task at most 3 times if execution failed
or was timed out.
6.      Allow job scheduler to populate the queue with jobs that should run
some amount of time later.
7.      Allow job scheduler to specify expiration time for a task, after
which time the task will be deleted from the queue (at least to the
external user of the http service).
8.      Delete or release a reserved job based on a status code.

We did some research and wanted to use beanstalk as our task queue.
Since the current beanstalk doesn’t provide all the features we need,
we decided to add a wrapper to it. The wrapper would provide following
features.
a.      Provide REST interface for beanstalk requests.
b.      Support max-retry mechanism.
c.      Support expiration of tasks.
d.      Decide whether to delete or release a job based on the status code
(negative = release; otherwise delete)

We tried to implement this wrapper with JavaBeanstalkClient. Our
initial implementation is to share use one connection for all the
requests. Then we realize we will face a race condition with this
approach. For example, two requests come in for jobs. One is fetching
from tube A, one from tube B. If the two requests interlay with each
other, they may end up fetching from the same tube.

Furthermore, we are concerned with performance. We don’t know how will
this scale with single connection. So we went with one connection per
thread option. And found out that a reserved job wouldn’t get deleted
because reservation and deletion must be done via a single beanstalk
connection.  However, they are done via two http requests from the
worker node, which means they are from two different connections. The
current implementation of beanstalk does not delete a job if the
deletion is done through a different connection from it’s reserved.

In addition to the problem mentioned above, we’ve also discovered that
beanstalk is not able to enforce TTR when there is no active
connection.

I know a lot of you have good experience with beanstalk. Can you give
some suggestions on how should I use beanstalk to fulfill our specific
needs? Any good client libraries would you recommend?

Also I read one post at 
http://groups.google.com/group/beanstalk-talk/browse_thread/thread/bf53b267e9e72830#,
and it seems ruby client supports the beanstalk server cluster, which
will improve the availability of the task queue. I wonder if similar
feature is available in java or C++ client libraries?

Thanks a lot.
Iris

-- 
You received this message because you are subscribed to the Google Groups 
"beanstalk-talk" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/beanstalk-talk?hl=en.

Reply via email to