On Mon, Aug 02, 2010 at 02:43:30AM -0600, Jerry Champlin wrote: > Willy: > > On the load balancer we have iptables rate limits on ssh connections only. > I have removed iptables and netfilter entirely on the server side to ensure > that resets are not coming from iptables. I can watch the resets on the app > server and see them increasing steadily and incrementing faster when the > 503s happen.
OK. > Is it possible that keepalives have something to do with this? No, because you would see pure resets, not RST+ACK since those would be in response to an ACK. You may want to check with "netstat -s" on the servers just in case you'd find more information on the reason for these RST. > I have searched the apache error logs and do not see anything there. 90% > of the app will not make use of keepalive with the exception of digest > auth. And you're absolutely sure that the app can't cause an apache process crash from time to time ? There is another possibility that comes to mind. If your SYN backlog is too short (default settings) AND you have tcp_abort_on_overflow non-zero, then you can get resets in response to SYNs once the backlog is full. Please check that on the servers : $ cat /proc/sys/net/ipv4/tcp_abort_on_overflow And if it's not zero, write zero there and see if the problem persists (SYNs will be dropped instead of causing a RST to be emitted). If the problem disappears, it means you have to tweak other sysctls in order to avoid filling the backlog. Regards, Willy

