We're still suffering from unexplained workers timing out.  We recently 
upgraded to the latest unicorn 4.6.3 (while still on REE 1.8.7) in the hopes 
that it would solve our issues.  Unfortunately, this seemed to exacerbate the 
problem, with timeouts happening more frequently, but that could be related to 
greater precision in timeouts in newer versions of unicorn.  (In our unicorn 
3.6.2, a timeout set to 120s might not ACTUALLY timeout until 180s or more, 
thus allowing a bit more time for Ruby to finish whatever it was choking on.)

We dropped the timeout down to 65s (to make sure it was triggered) and then 
tried to add greater logging (per 
http://permalink.gmane.org/gmane.comp.lang.ruby.unicorn.general/1269.)  The 
START/FINISH approach confirms it's not an issue with our application code, ie:

HH:MM:SS- S/F[PID]- /PATH
15:21:01- START-25904- /pathA
15:21:01- FINISH-25904- /pathA
15:21:01- START-25904- /pathB
15:21:01- FINISH-25904- /pathB
15:21:01- START-25904- /pathC
15:21:01- FINISH-25904- /pathC
worker=11 PID:25904 timeout (66s > 65s), killing
reaped #<Process::Status: pid=25904,signaled(SIGKILL=9)> worker=11

For each START we always get a corresponding FINISH and then the worker is 
killed.  Additionally, our nginx logs confirm that this last request was sent 
back to the client.  No 'upstream' errors in our nginx log, either.

When we tried the Thread sleep approach, nothing actually appeared in the logs. 
 I imagine this means that ruby or some C extension is misbehaving.

Unfortunately, it's been impossible for us to recreate this in development.  

Thoughts?

RHEL 5.6
REE 1.8.7 2011.12
Unicorn 4.6.3
16 unicorn workers on 8 cores
No swap activity, no peaks in load

Again, thanks for all your help!

-Nick

_______________________________________________
Unicorn mailing list - [email protected]
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying

Reply via email to