Hi, On Tue, Jul 08, 2014 at 02:57:53PM +0200, hodor wrote: > > So I've pushed that and backported into 1.5 as well before everyone starts > > to get upset by this nasty behaviour. > > Just tried it and seems to work great :). It even recovers the abstract > sockets smoothly if the new instance fails to start.
Yes that was the purpose :-) > The problem with plain unix socket being unreachable (unlinked) when the > new instance fails to start is still there, though. This can happen when > we try to bind() on a new tcp port which is occupied by something else > than haproxy. Not exactly, I know what's happening, you have a frontend which had both a unix socket and an abstract socket. When resuming, the abstract socket failed and the proxy was marked in error so polling was not re-enabled on its listeners. I still have to see how far we can go to change that. It's very tricky as we don't want to leave a process in a bad state which will never stop for example. Initially when the soft restart was implemented, we were not supposed to have multiple processes listening :-) The pause/resume operations for unix sockets are different than those for other protocols because a file system access is needed, so they're performed by the new process. > Perhaps this could be solved by delaying the rename(tempname, path) and > unlink(backname) after all else is done? Something like .bind_finish() > and .bind_rollback() in struct protocol, where .bind_finish() would be > for "all is okay" and .bind_rollback() for "something else failed, > return the socket to the old haproxy instance"? Those functions could be > called after we are reasonably sure nothing else can fail. All that is properly done. Check your config to ensure you're not in the case above, or alternately, comment out the "fail = 1" statement at liine 841 in proxy.c and you will see this annoying behaviour go by itself. Willy

