A barrage of unexplained timeouts

nick Tue, 20 Aug 2013 07:58:03 -0700

We've been running unicorn-3.6.2 on REE 1.8.7 2011.12 in production for quite 
some time and we use monit to monitor each unicorn worker.  Occasionally, I'll 
get a notification that a worker has timed-out and has been re-spawned.  In all 
these cases, when I look at the rails logs, I can see the last request that the 
worker handled, and they all have appeared to complete successfully from the 
client's perspective (rails and nginx respond with 200), but the unicorn log 
shows that it was killed due to timeout.  This has always been relatively rare 
and I thought it was a non-problem.


Until today.

Today, for about a 7 minute period, our workers would continually report as 
having timed-out and would be killed by the master.  After re-spawning, the 
workers would serve a handful of requests and then eventually be killed again.

During this time, our servers (Web, PG DB, and redis) were not under load and 
IO was normal.  After the last monit notification at 8:30, everything went back 
to normal.  I understand why unicorns would timeout if they were waiting (>120 
secs) on IO, but there aren't any orphaned requests in the rails log.  For each 
request line, there's a corresponding completion line.  No long running queries 
to blame on PG, either.

I know we're probably due for an upgrade, but I'm hoping to get to the bottom 
of these unexplained timeouts.

Thanks for your help!

-Nick

_______________________________________________
Unicorn mailing list - [email protected]
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying

A barrage of unexplained timeouts

Reply via email to