"Eric Wong" <[email protected]> said: > [email protected] wrote: >> "Eric Wong" <[email protected]> said: >> > >> > Do you have any other requests in your logs which could be taking >> > a long time and hogging workers, but not high enough to trigger the >> > unicorn kill timeout. > >> I don't *think* so. Most requests finish <300ms. We do have some >> more intensive code-paths, but they're administrative and called much >> less frequently. Most of these pages complete in <3seconds. >> >> For requests that made it to rails logging, the LAST processed request >> before the worker timed-out all completed very quickly (and no real >> pattern in terms of which page may be triggering it.) > > This is really strange. This was only really bad for a 7s period?
It was a 7 minute period. All of the workers would become busy and exceed their >120s timeout. Master would kill and re-spawn them, they'd start to respond to a handful of requests (anywhere from 5-50) after which they'd become "busy" again, and get force killed by master. This pattern happened 3 times over a 7 minute period. > Has it happened again? No > Anything else going on with the system at that time? Swapping, > particularly... No swap activity or high load. Our munin graphs indicate a peak of web/app server disk latency around that time, although our graphs show many other similar peaks, without incident. > And if you're inside a VM, maybe your neighbors were hogging things. > Large PUT/POST requests which require filesystem I/O are particularly > sensitive to this. We can't blame virtualization as we're on dedicated hardware. >> > Is this with Unix or TCP sockets? If it's over a LAN, maybe there's >> > still a bad switch/port/cable somewhere (that happens often to me). >> >> TCP sockets, with nginx and unicorn running on the same box. > > OK, that probably rules out a bunch of problems. > Just to be thorough, anything interesting in dmesg or syslogs? Nothing interesting in /var/log/messages. dmesg (no timestamps) includes lines like: TCP: Treason uncloaked! Peer ###.###.###.###:63554/80 shrinks window 1149440448:1149447132. Repaired. >> > With Unix sockets, I don't recall encountering recent problems under >> > Linux. Which OS are you running? >> >> Stock RHEL 5, kernel 2.6.18. > > RHEL 5.0 or 5.x? I can't remember /that/ far back to 5.0 (I don't think > I even tried it until 5.2), but don't recall anything being obviously > broken in those... We're on RHEL 5.6. Thanks, -Nick _______________________________________________ Unicorn mailing list - [email protected] http://rubyforge.org/mailman/listinfo/mongrel-unicorn Do not quote signatures (like this one) or top post when replying
