We're still suffering from unexplained workers timing out. We recently upgraded to the latest unicorn 4.6.3 (while still on REE 1.8.7) in the hopes that it would solve our issues. Unfortunately, this seemed to exacerbate the problem, with timeouts happening more frequently, but that could be related to greater precision in timeouts in newer versions of unicorn. (In our unicorn 3.6.2, a timeout set to 120s might not ACTUALLY timeout until 180s or more, thus allowing a bit more time for Ruby to finish whatever it was choking on.)
We dropped the timeout down to 65s (to make sure it was triggered) and then tried to add greater logging (per http://permalink.gmane.org/gmane.comp.lang.ruby.unicorn.general/1269.) The START/FINISH approach confirms it's not an issue with our application code, ie: HH:MM:SS- S/F[PID]- /PATH 15:21:01- START-25904- /pathA 15:21:01- FINISH-25904- /pathA 15:21:01- START-25904- /pathB 15:21:01- FINISH-25904- /pathB 15:21:01- START-25904- /pathC 15:21:01- FINISH-25904- /pathC worker=11 PID:25904 timeout (66s > 65s), killing reaped #<Process::Status: pid=25904,signaled(SIGKILL=9)> worker=11 For each START we always get a corresponding FINISH and then the worker is killed. Additionally, our nginx logs confirm that this last request was sent back to the client. No 'upstream' errors in our nginx log, either. When we tried the Thread sleep approach, nothing actually appeared in the logs. I imagine this means that ruby or some C extension is misbehaving. Unfortunately, it's been impossible for us to recreate this in development. Thoughts? RHEL 5.6 REE 1.8.7 2011.12 Unicorn 4.6.3 16 unicorn workers on 8 cores No swap activity, no peaks in load Again, thanks for all your help! -Nick _______________________________________________ Unicorn mailing list - [email protected] http://rubyforge.org/mailman/listinfo/mongrel-unicorn Do not quote signatures (like this one) or top post when replying
