On Wed, Mar 03, 2010 at 02:58:47PM -0500, Joe Stein wrote:
> another symptom we saw often was we could not get to the stats page or even
> ssh into the box when this was happening
Ah that's interesting, because this indicates that your conf is
not at fault, but something between the OS and the network is.
Maybe you have long blackouts due to spanning tree on a switch
(for instance), preventing servers from draining queued requests,
then the network comes back later (30 to 50s), add to that the
TCP retransmit timers, and since your server connections don't
move, queued pending requests finally expire after one minute.
You can also sometimes see that with a broken fiber which does
not like people operating too close to it. You should definitely
check all of your switch's logs.
Another possibility can appear with load on linux, if you have
ip_conntrack / nf_conntrack loaded with default settings. At one
point your system's connection table fills up and it does not
accept anymore connection. This can be quickly detected in the
kernel's logs ("conntrack state table full" or something like
that).
Willy