So we had an incident today where haproxy segfaulted and our site went
down. Unfortunately we did not capture a core, and the segfault message
logged to dmesg just showed it inside libc. So there's likely not much
we can do here. We'll be making changes to ensure we capture a core in
the future.

However the issue I am reporting that is reproducible (on version 1.7.5)
is that haproxy did not auto restart, which would have minimized the
downtime to the site. We use nbproc > 1, so we have multiple haproxy
processes running, and when one of them dies, neither the
"haproxy-master" process or the "haproxy-systemd-wrapper" process exits,
which prevents systemd from starting the service back up.

While I think this behavior would be fine, a possible alternative would
be for the "haproxy-master" process to restart the dead worker without
having to kill all the other processes.

Another possible action would be to leave the workers running, but
signal them to stop accepting new connections, and then let the
"haproxy-master" exit so systemd will restart it.

But in any case, I think we need some way of handling this so that site
interruption is minimal.

-Patrick

Reply via email to