"J. Austin Hughey" <[email protected]> wrote: > The general idea is that I'd like to have some way to "warn" the > application when it's about to be killed. I've patched > murder_lazy_workers to send an abort signal via kill_worker, sleep for > 5 seconds, then look and see if the process still exists by using > Process.getpgid. If so, it sends the original kill command, and if > not, a rescue block kicks in to prevent the raised error from > Process.getpgid from making things explode.
The problem with anything other than SIGKILL (or SIGSTOP) is that it assumes the Ruby VM is working and in a good state. > I've created a simulation app, built on Rails 3.0, that uses a generic > "posts" controller to simulate a long-running request. Instead of > just throwing a straight-up sleep 65 in there, I have it running > through sleeping 1 second on a decrementing counter, and doing that 65 > times. The reason is because, assuming I've read the code correctly, > even with my "skip sleeping workers" commented line below, it'll skip > over the process, thus rendering my simulation of a long-running > process invalid. However, clarification on this point is certainly > welcome. You can see the app here: > https://github.com/jaustinhughey/unicorn_test/blob/master/app/controllers/posts_controller.rb (purely for educational purposes, since I'll point you towards another approach I believe is better) Signal.trap(:ABRT) do # Write some stuff to the Rails log logger.info "Caught Unicorn kill exception!" If this is the logger that ships with Ruby, it locks a Mutex, so it'll deadlock if another SIGABRT is received while logging the above statement (a very small window, admittedly). # Do a controlled disconnect from ActiveRecord ActiveRecord::Base.connection.disconnect! Likewise, if AR needs to lock internal structures before disconnecting, it also must be reentrant. Ruby's normal Mutex implementation is not reentrant-safe. > So it looks like Worker 1 is hitting a strange/false timeout of > 1315467289 seconds, which isn't really possible as it wasn't even > running 1315467289 seconds prior to that (which equates to roughly 41 > years ago if my math is right). You're getting this because you removed the following line: 0 == tick and next # skip workers that are sleeping sleeping means they haven't accepted a client connection, yet. Not sleeping while processing a client request. I'll clarify that in the code. > Needless to say, I'm a bit stumped at this point, and would sincerely > appreciate another point of view on this. Am I going about this all > wrong? Is there a better approach I should consider? And if I'm on > the right track, how can I get this to work regardless of how many > Unicorn workers are running? Since it's an application error, it can be done as middleware. You can try something like the Rainbows::ThreadTimeout middleware, it's currently Rainbows! specific but can easily be made to work with Unicorn. git clone git://bogomips.org/rainbows cat rainbows/lib/rainbows/thread_timeout.rb This is conceptually similar to "timeout" in the Ruby standard library, but does not allow nesting. I'll try to clarify more later today if you have questions, in a bit of a rush right now. -- Eric Wong _______________________________________________ Unicorn mailing list - [email protected] http://rubyforge.org/mailman/listinfo/mongrel-unicorn Do not quote signatures (like this one) or top post when replying
