We need a generic task queue for our execution framework. Our specific needs for a task queue includes 1. The interaction with beanstalk server is done via http requests. 2. Support tubes. 3. Hand out tasks based on priority of the tasks 4. Put the reserved task back to ready queue if the execution time exceeds the TTR 5. Allow job scheduler to specify max-retry count for a task. For example, attempt to execute a task at most 3 times if execution failed or was timed out. 6. Allow job scheduler to populate the queue with jobs that should run some amount of time later. 7. Allow job scheduler to specify expiration time for a task, after which time the task will be deleted from the queue (at least to the external user of the http service). 8. Delete or release a reserved job based on a status code.
We did some research and wanted to use beanstalk as our task queue. Since the current beanstalk doesn’t provide all the features we need, we decided to add a wrapper to it. The wrapper would provide following features. a. Provide REST interface for beanstalk requests. b. Support max-retry mechanism. c. Support expiration of tasks. d. Decide whether to delete or release a job based on the status code (negative = release; otherwise delete) We tried to implement this wrapper with JavaBeanstalkClient. Our initial implementation is to share use one connection for all the requests. Then we realize we will face a race condition with this approach. For example, two requests come in for jobs. One is fetching from tube A, one from tube B. If the two requests interlay with each other, they may end up fetching from the same tube. Furthermore, we are concerned with performance. We don’t know how will this scale with single connection. So we went with one connection per thread option. And found out that a reserved job wouldn’t get deleted because reservation and deletion must be done via a single beanstalk connection. However, they are done via two http requests from the worker node, which means they are from two different connections. The current implementation of beanstalk does not delete a job if the deletion is done through a different connection from it’s reserved. In addition to the problem mentioned above, we’ve also discovered that beanstalk is not able to enforce TTR when there is no active connection. I know a lot of you have good experience with beanstalk. Can you give some suggestions on how should I use beanstalk to fulfill our specific needs? Any good client libraries would you recommend? Also I read one post at http://groups.google.com/group/beanstalk-talk/browse_thread/thread/bf53b267e9e72830#, and it seems ruby client supports the beanstalk server cluster, which will improve the availability of the task queue. I wonder if similar feature is available in java or C++ client libraries? Thanks a lot. Iris -- You received this message because you are subscribed to the Google Groups "beanstalk-talk" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/beanstalk-talk?hl=en.
