So we had an incident today where haproxy segfaulted and our site went down. Unfortunately we did not capture a core, and the segfault message logged to dmesg just showed it inside libc. So there's likely not much we can do here. We'll be making changes to ensure we capture a core in the future.
However the issue I am reporting that is reproducible (on version 1.7.5) is that haproxy did not auto restart, which would have minimized the downtime to the site. We use nbproc > 1, so we have multiple haproxy processes running, and when one of them dies, neither the "haproxy-master" process or the "haproxy-systemd-wrapper" process exits, which prevents systemd from starting the service back up. While I think this behavior would be fine, a possible alternative would be for the "haproxy-master" process to restart the dead worker without having to kill all the other processes. Another possible action would be to leave the workers running, but signal them to stop accepting new connections, and then let the "haproxy-master" exit so systemd will restart it. But in any case, I think we need some way of handling this so that site interruption is minimal. -Patrick

