On inspection, Keepalive is off on  the server side so that's definitely not
it.


> And you're absolutely sure that the app can't cause an apache process crash
> from time to time ?
>

The app could be causing the problem.  We are running in multi-process
instead of multi-thread mode and we're lightly loaded in this environment so
it's easy to watch.  I would expect 5 apache processes across both server
nodes to be spawned or die when a 503 happens.  I'm not seeing that happen.
 I don't think we can rule it out, but I'm not seeing enough process
spawning by apache to implicate this as the sole problem.  I'll increase the
retries to 10 to make it more obvious on the server side if this is indeed
the case.

>
> There is another possibility that comes to mind. If your SYN backlog is
> too short (default settings) AND you have tcp_abort_on_overflow non-zero,
> then you can get resets in response to SYNs once the backlog is full.
> Please check that on the servers :
>
>  $ cat /proc/sys/net/ipv4/tcp_abort_on_overflow
>
> already set to 0


> And if it's not zero, write zero there and see if the problem persists
> (SYNs will be dropped instead of causing a RST to be emitted). If the
> problem disappears, it means you have to tweak other sysctls in order
> to avoid filling the backlog.
>
> Regards,
> Willy
>
>

Reply via email to