"Eric Wong" <[email protected]> said:

> [email protected] wrote:
>> "Eric Wong" <[email protected]> said:
>> >
>> > Do you have any other requests in your logs which could be taking
>> > a long time and hogging workers, but not high enough to trigger the
>> > unicorn kill timeout.
> 
>> I don't *think* so.  Most requests finish <300ms.  We do have some
>> more intensive code-paths, but they're administrative and called much
>> less frequently.  Most of these pages complete in <3seconds.
>>
>> For requests that made it to rails logging, the LAST processed request
>> before the worker timed-out all completed very quickly (and no real
>> pattern in terms of which page may be triggering it.)
> 
> This is really strange.  This was only really bad for a 7s period?

It was a 7 minute period.  All of the workers would become busy and exceed 
their >120s timeout.  Master would kill and re-spawn them, they'd start to 
respond to a handful of requests (anywhere from 5-50) after which they'd become 
"busy" again, and get force killed by master.  This pattern happened 3 times 
over a 7 minute period.

> Has it happened again?

No

> Anything else going on with the system at that time?  Swapping, 
> particularly...

No swap activity or high load.  Our munin graphs indicate a peak of web/app 
server disk latency around that time, although our graphs show many other 
similar peaks, without incident.

> And if you're inside a VM, maybe your neighbors were hogging things.
> Large PUT/POST requests which require filesystem I/O are particularly
> sensitive to this.

We can't blame virtualization as we're on dedicated hardware.

>> > Is this with Unix or TCP sockets?  If it's over a LAN, maybe there's
>> > still a bad switch/port/cable somewhere (that happens often to me).
>>
>> TCP sockets, with nginx and unicorn running on the same box.
> 
> OK, that probably rules out a bunch of problems.
> Just to be thorough, anything interesting in dmesg or syslogs?

Nothing interesting in /var/log/messages.  dmesg (no timestamps) includes lines 
like:

TCP: Treason uncloaked! Peer ###.###.###.###:63554/80 shrinks window 
1149440448:1149447132. Repaired.

>> > With Unix sockets, I don't recall encountering recent problems under
>> > Linux.  Which OS are you running?
>>
>> Stock RHEL 5, kernel 2.6.18.
> 
> RHEL 5.0 or 5.x?  I can't remember /that/ far back to 5.0 (I don't think
> I even tried it until 5.2), but don't recall anything being obviously
> broken in those...

We're on RHEL 5.6.

Thanks,

-Nick

_______________________________________________
Unicorn mailing list - [email protected]
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying

Reply via email to