Hi Keith,

On Tuesday, October 30, 2012 11:07:37 PM UTC+1, Keith Rarick wrote:
>
> Here's my advice for dealing with failed jobs: 
> http://xph.us/2010/05/02/how-to-handle-job-failures. 


Thanks for the link (that should be 
http://xph.us/2010/05/02/how-to-handle-job-failures.html by the way, to 
help others). Interesting to read your solution! I am going to take a few 
snippets out of that post if you don't mind.

1. "See what sorts of failures happen in production": yes, I have logging 
for my workers. In fact, my webapp is half http, half cli and the second 
half also takes care of the workers. I have app-wide logging to various 
channels, so I log the workers too. However, how do you relate the errors 
from jobs to your log? I want to create a web interface with information 
about the buried jobs. I must open my log file, parse it, peek the bury 
list (preferable all at once), relate the job ids to the logging 
information and show that to the user.

I log the exceptions too, but to make the process more comfortable, I see 
nothing against it to store the exception type + message in the job data 
itself as well. It gets only more complicated with the reasoning I see here 
a lot: "just delete the job and put a new one back". Ids are lost and 
tracking is near-impossible.
 
2. "It might also make sense to retry some jobs only a limited number of 
times before deleting them": I can only come up with a method to delete the 
job and put a clone back. Releasing a job with a delay can't be done for 
above reasons (ie, where do you store the counter?).

3. "For retries, [...], but do add a time delay with exponential backoff": 
same as above, where do you store this logic?

Ad 1: In case you want to minimize errors (always good) and thus reduce the 
bury queue (also good), you probably need more data. The complete stack 
trace might be logged into your real logging service. If you have that 
goal, you use the log to process the information. But for quick inspection 
about the job's reason to be buried it's much too complicated I think.

Instead of adding features to beanstalkd to handle other 
> things (such as logging stack traces, or tracking the 
> application's long-term state), it's better to keep beanstalkd 
> itself focused on scheduling work to be done, and leave 
> those other things to other tools. 
>

I completely agree. The simplicity of beanstalk is something I really 
appreciate. However, I think the described enhancements could leverage 
beanstalk's usage without losing focus, compromise on memory footprint or 
end up being a clumsy one-size-fits-all solution. 

If you have a work item that has two separate phases 
> of execution, and those phases can fail independently, 
> it might make sense to break it apart into two jobs. 


It's what we do already :) We have now a custom solution as a nodejs app 
where we schedule using Redis as a queue service. All jobs are atomic, to 
make no mistakes with chains of failures. 

Having said all that, we definitely need ways to get better 
> visibility into beanstalkd's internal state while it's running 
> in production. 
>

I really would like to hear your thoughts about this more. If you have more 
information, thoughts, developments or whatsoever, please share :)
---
Jurian Sluiman 

-- 
You received this message because you are subscribed to the Google Groups 
"beanstalk-talk" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/beanstalk-talk/-/Pm8b9vOhNfMJ.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/beanstalk-talk?hl=en.

Reply via email to