Hi, On Tue, Jul 08, 2014 at 10:36:10PM +0200, hodor wrote: > Hello, > > On 2014-07-08, 15:26:32, Willy Tarreau wrote: > > Not exactly, I know what's happening, you have a frontend which had both > > a unix socket and an abstract socket. When resuming, the abstract socket > > failed and the proxy was marked in error so polling was not re-enabled on > > its listeners. I still have to see how far we can go to change that. It's > > very tricky as we don't want to leave a process in a bad state which will > > never stop for example. Initially when the soft restart was implemented, > > we were not supposed to have multiple processes listening :-) > > > > The pause/resume operations for unix sockets are different than those > > for other protocols because a file system access is needed, so they're > > performed by the new process. > > > > > Perhaps this could be solved by delaying the rename(tempname, path) and > > > unlink(backname) after all else is done? Something like .bind_finish() > > > and .bind_rollback() in struct protocol, where .bind_finish() would be > > > for "all is okay" and .bind_rollback() for "something else failed, > > > return the socket to the old haproxy instance"? Those functions could be > > > called after we are reasonably sure nothing else can fail. > > > > All that is properly done. Check your config to ensure you're not in > > the case above, or alternately, comment out the "fail = 1" statement > > at liine 841 in proxy.c and you will see this annoying behaviour go > > by itself. > > I think we are talking about different problems. The one I mentioned > doesn't even need abstract sockets at all. It just needs some other > thing to fail after we have already made the link(), bind(), rename(), > unlink() stuff. > > Let's have these two config files: > > conf1: > > --------------------------- > global > pidfile /tmp/proxy/pid > > defaults > mode tcp > > listen test1 > bind unix@/tmp/test1.sock > server test1 127.0.0.1:22 > --------------------------- > > > conf2: > > --------------------------- > global > pidfile /tmp/proxy/pid > > defaults > mode tcp > > listen test1 > bind unix@/tmp/test1.sock > server test1 127.0.0.1:22 > > listen test2 > bind [email protected]:22 > server test2 127.0.0.1:23 > --------------------------- > > First start the first one (deamon necessary for pid file): > > ./haproxy -f conf1 -D > > "socat stdio unix-connect:/tmp/test1.sock" now works and connects to the > local SSH. > > Now we try to reload haproxy with the second config: > > ./haproxy -f conf2 -p pid -D -sf `cat pid` > > The whole new haproxy instance will fail as port 22 is occupied by SSH > and cannot be bound. The new instance unlink()ed the original > /tmp/test1.sock, so the old instance, although running, is now > effectively useless. > > I tried with and without the "fail = 1" statement present. Did I miss > something? > > I realize it is not entirely fair to change the config this way :). It > is not a problem for me. I just wanted to point out this can happen.
We were clearly talking about the same thing, but I don't experience this with my configs, it works perfectly and correctly restores the old socket when leaving. I'll have to retry with your config, I would not be surprized to meet a corner case. Willy

