Hi,

On Tue, Jul 08, 2014 at 10:36:10PM +0200, hodor wrote:
> Hello,
> 
> On 2014-07-08, 15:26:32, Willy Tarreau wrote:
> > Not exactly, I know what's happening, you have a frontend which had both
> > a unix socket and an abstract socket. When resuming, the abstract socket
> > failed and the proxy was marked in error so polling was not re-enabled on
> > its listeners. I still have to see how far we can go to change that. It's
> > very tricky as we don't want to leave a process in a bad state which will
> > never stop for example. Initially when the soft restart was implemented,
> > we were not supposed to have multiple processes listening :-)
> > 
> > The pause/resume operations for unix sockets are different than those
> > for other protocols because a file system access is needed, so they're
> > performed by the new process.
> > 
> > > Perhaps this could be solved by delaying the rename(tempname, path) and
> > > unlink(backname) after all else is done? Something like .bind_finish()
> > > and .bind_rollback() in struct protocol, where .bind_finish() would be
> > > for "all is okay" and .bind_rollback() for "something else failed,
> > > return the socket to the old haproxy instance"? Those functions could be
> > > called after we are reasonably sure nothing else can fail.
> > 
> > All that is properly done. Check your config to ensure you're not in
> > the case above, or alternately, comment out the "fail = 1" statement
> > at liine 841 in proxy.c and you will see this annoying behaviour go
> > by itself.
> 
> I think we are talking about different problems. The one I mentioned
> doesn't even need abstract sockets at all. It just needs some other
> thing to fail after we have already made the link(), bind(), rename(),
> unlink() stuff.
> 
> Let's have these two config files:
> 
> conf1:
> 
> ---------------------------
> global
>   pidfile /tmp/proxy/pid
> 
> defaults
>   mode tcp
> 
> listen test1
>   bind unix@/tmp/test1.sock
>   server test1 127.0.0.1:22
> ---------------------------
> 
> 
> conf2:
> 
> ---------------------------
> global
>   pidfile /tmp/proxy/pid
> 
> defaults
>   mode tcp
> 
> listen test1
>   bind unix@/tmp/test1.sock
>   server test1 127.0.0.1:22
> 
> listen test2
>   bind [email protected]:22
>   server test2 127.0.0.1:23
> ---------------------------
> 
> First start the first one (deamon necessary for pid file):
> 
> ./haproxy -f conf1 -D
> 
> "socat stdio unix-connect:/tmp/test1.sock" now works and connects to the
> local SSH.
> 
> Now we try to reload haproxy with the second config:
> 
> ./haproxy -f conf2 -p pid -D -sf `cat pid`
> 
> The whole new haproxy instance will fail as port 22 is occupied by SSH
> and cannot be bound. The new instance unlink()ed the original
> /tmp/test1.sock, so the old instance, although running, is now
> effectively useless.
> 
> I tried with and without the "fail = 1" statement present. Did I miss
> something?
> 
> I realize it is not entirely fair to change the config this way :). It
> is not a problem for me. I just wanted to point out this can happen.

We were clearly talking about the same thing, but I don't experience
this with my configs, it works perfectly and correctly restores the
old socket when leaving. I'll have to retry with your config, I would
not be surprized to meet a corner case.

Willy


Reply via email to