Hi again, On Mon, Oct 24, 2016 at 07:41:06PM +0200, Willy Tarreau wrote: > I don't know if this is something you're interested in experimenting > with. This is achieved using fcntl(F_SETLKW). It should be done in the > wrapper as well.
Finally I did it and it doesn't help at all. The signal-based asynchronous reload is fundamentally flawed. It's amazing to see how systemd managed to break something simple and robust in the sake of reliability, by introducing asynchronous signal delivery... The problem is not even with overlapping writes (well, it very likely happens) but it is related to the fact that you never know whom you're sending your signals at all and that the children may not even be started yet, or may not have had the time to process the whole config file, etc. So now I'm wondering what to do with all this mess. Declaring systemd misdesigned and born with some serious trauma will not help us progress on this, so we need to work around this pile of crap which tries to prevent us from dealing with a simple service. Either we find a way to completely redesign the wrapper, even possibly the relation between the wrapper and the sub-processes, or we'll simply have to get rid of the reload action under systemd and reroute it to a restart. I've thought about something which could possibly work though I'm far from being sure for now. Let's say that the wrapper tries to take an exclusive lock on the pidfile upon receipt of SIGUSR2. It then keeps the file open and passes this FD to all the haproxy sub-processes. Ideally the FD num is passed as an argument to the child. Once it fork()+exec(), it can simply close its fd. The exclusive lock is still maintained by the children so it's not lost. The benefit is that at this point, until the sub-processes have closed the pid file, there's no way for the wrapper to pick the same lock again. Thus it can *know* the processes have not finished booting. This will cause further SIGUSR2 processing to wait for the children processes to either start or die. Sort of a way to "pass" the lock to the sub-processes. Here we don't even care if signals are sent in storm because only one of them will be used and will have to wait for the previous one to be dealt with. The model is not perfect and ideally a lock file would be better than using the pidfile since the pidfile currently is opened late in haproxy and requires an unlinking in case of successful startup. But I suspect that using extra files will just make things worse. And I don't know if it's possible to flock something else (eg: a pipe). BTW, that just makes me realize that we also have another possibility for this precisely using a pipe (which are more portable than mandatory locks). Let's see if that would work. The wrapper creates a pipe then forks. The child closes the read side, the parent the write side. Then the parent performs a read() on this fd and waits until it returns zero. The child execve() and calls the haproxy sub-processes. The FD is closed after the pidfile is updated (and in children). After the last close, the wrapper receives a zero on this pipe. If haproxy dies, the pipe is closed as well. We could even (ab)use it to let the wrapper know whether the process properly started or not, or pass the pids there (though that just needlessly complicates operations). Any opinion on this ? Willy

