HI James When you do a graceful reload of haproxy this is what happens.
1. the old process will accept no more connections and the stats page is stopped and so is the socket 2. a new haproxy instance is started where new clients get connected to, and this has the live socket 3. when the old haproxy instance has no more clients left it dies silently leaving all the clients on the new haproxy instance. This is expected behavior as you want the first haproxy to die when the last client leaves. Regards Andrew Smalley Loadbalancer.org Ltd. On 12 April 2017 at 19:32, James Brown <[email protected]> wrote: > This just hit us again on a different set of load balancers... if there's > a listen socket overflow on a domain socket during graceful, haproxy > completely deletes the domain socket and becomes inaccessible. > > On Tue, Feb 21, 2017 at 6:47 PM, James Brown <[email protected]> wrote: > >> Under load, we're sometimes seeing a situation where HAProxy will >> completely delete a bound unix domain socket after a reload. >> >> The "bad flow" looks something like the following: >> >> >> - haproxy is running on pid A, bound to /var/run/domain.sock (via a >> bind line in a frontend) >> - we run `haproxy -sf A`, which starts a new haproxy on pid B >> - pid B binds to /var/run/domain.sock.B >> - pid B moves /var/run/domain.sock.B to /var/run/domain.sock (in >> uxst_bind_listener) >> - in the mean time, there are a zillion connections to >> /var/run/domain.sock and pid B isn't started up yet; backlog is exhausted >> - pid B signals pid A to shut down >> - pid A runs the destroy_uxst_socket function and tries to connect to >> /var/run/domain.sock to see if it's still in use. The connection fails >> (because the backlog is full). Pid A unlinks /var/run/domain.sock. >> Everything is sad forever now. >> >> I'm thinking about just commenting out the call to destroy_uxst_socket >> since this is all on a tmpfs and we don't really care if spare sockets are >> leaked when/if we change configuration in the future. Arguably, the >> solution should be something where we don't overflow the listen socket at >> all; I'm thinking about also binding to a TCP port on localhost and just >> using that for the few seconds it takes to reload (since otherwise we run >> out of ephemeral sockets to 127.0.0.1); it still seems wrong for haproxy to >> unlink the socket, though. >> >> This has proven extremely irritating to reproduce (since it only occurs >> if there's enough load to fill up the backlog on the socket between when >> pid B starts up and when pid A shuts down), but I'm pretty confident that >> what I described above is happening, since periodically on reloads the >> domain socket isn't there and this code fits. >> >> Our configs are quite large, so I'm not reproducing them here. The reason >> we bind on a domain socket at all is because we're running two sets of >> haproxies — one in multi-process mode doing TCP-mode SSL termination >> pointing back over a domain socket to a single-process haproxy applying all >> of our actual config. >> >> -- >> James Brown >> Systems >> Engineer >> > > > > -- > James Brown > Engineer >

