On Mon, 06 Mar 2017 15:02:43 -0500, Willy Tarreau <[email protected]> wrote:
OK so that means that haproxy could have hung in a day or two, then your
case is much more common than one of the other reports. If your fdront LB
is fair between the 6 servers, that could be related to a total number of
requests or connections or something like this.
Another relevant point is that these servers are tied together using
upstream, GeoIP-based DNS load balancing. So the request rate across
servers varies quite a bit depending on the location. This would make a
synchronized failure based on total requests less likely.
I'm thinking about other things :
- if you're doing a lot of SSL we could imagine an issue with random
generation using /dev/random instead of /dev/urandom. I've met this
issue a long time ago on some apache servers where all the entropy
was progressively consumed until it was not possible anymore to get
a connection.
I'll set up a script to capture the netstat and other info prior to
reloading should this issue re-occur.
As for SSL, yes, we do a fair bit of SSL ( about 30% of total request
count ) and HAProxy does the TLS termination and then hands off via TCP
proxy.
Best,
-=Mark S.