Hi again,

On Mon, Oct 24, 2016 at 07:41:06PM +0200, Willy Tarreau wrote:
> I don't know if this is something you're interested in experimenting
> with. This is achieved using fcntl(F_SETLKW). It should be done in the
> wrapper as well.

Finally I did it and it doesn't help at all. The signal-based asynchronous
reload is fundamentally flawed. It's amazing to see how systemd managed to
break something simple and robust in the sake of reliability, by introducing
asynchronous signal delivery...

The problem is not even with overlapping writes (well, it very likely
happens) but it is related to the fact that you never know whom you're
sending your signals at all and that the children may not even be started
yet, or may not have had the time to process the whole config file, etc.

So now I'm wondering what to do with all this mess. Declaring systemd
misdesigned and born with some serious trauma will not help us progress
on this, so we need to work around this pile of crap which tries to prevent
us from dealing with a simple service.

Either we find a way to completely redesign the wrapper, even possibly the
relation between the wrapper and the sub-processes, or we'll simply have
to get rid of the reload action under systemd and reroute it to a restart.

I've thought about something which could possibly work though I'm far from
being sure for now.

Let's say that the wrapper tries to take an exclusive lock on the pidfile
upon receipt of SIGUSR2. It then keeps the file open and passes this FD to
all the haproxy sub-processes. Ideally the FD num is passed as an argument
to the child.

Once it fork()+exec(), it can simply close its fd. The exclusive lock is still
maintained by the children so it's not lost. The benefit is that at this
point, until the sub-processes have closed the pid file, there's no way for
the wrapper to pick the same lock again. Thus it can *know* the processes
have not finished booting. This will cause further SIGUSR2 processing to
wait for the children processes to either start or die. Sort of a way to
"pass" the lock to the sub-processes.

Here we don't even care if signals are sent in storm because only one of
them will be used and will have to wait for the previous one to be dealt
with.

The model is not perfect and ideally a lock file would be better than using
the pidfile since the pidfile currently is opened late in haproxy and requires
an unlinking in case of successful startup. But I suspect that using extra
files will just make things worse. And I don't know if it's possible to flock
something else (eg: a pipe).

BTW, that just makes me realize that we also have another possibility for this
precisely using a pipe (which are more portable than mandatory locks). Let's
see if that would work. The wrapper creates a pipe then forks. The child
closes the read side, the parent the write side. Then the parent performs a
read() on this fd and waits until it returns zero. The child execve() and
calls the haproxy sub-processes. The FD is closed after the pidfile is updated
(and in children). After the last close, the wrapper receives a zero on this
pipe. If haproxy dies, the pipe is closed as well. We could even (ab)use it
to let the wrapper know whether the process properly started or not, or pass
the pids there (though that just needlessly complicates operations).

Any opinion on this ?

Willy

Reply via email to