On Mon, Jun 20, 2016 at 12:50:40PM +0300, Andrey Galkin wrote:
> Hi Willy,
> 
> OK, I see your points.
> 
> Re:
> 1) there is a known problem with epoll: child process affects parent
> process. That was clearly visible in tests even with close() loop in
> child.
> I expect we may try to mitigate that by closing epoll FD before all others.

Yes, definitely. There are two problems with epoll over a fork :
  - FDs that are closed in one process and not the other one continue
    to be reported (since the file is not closed so the auto-remove
    is not performed)
  - FDs may randomly be reported to one epoll_wait or the other one.
    Since we're working in level-trigger mode it's not a problem though.

The epoll fd is one of the first ones, if not the first one, so the
loop starting from 0 to upper values will close it first (or almost
first) anyway.

If we wanted to be 100% clean, we should have the parent wait for the
child to close before going on, but I don't think we need this.

> 2) posix_spawn() actually helps with large VM size (optionally,
> constrained by limits of RAM). I guess, a significant part of running
> HAProxy VM is a constantly changing socket data. So, each fork() on a
> large instance should stress the system quite hard due to almost full
> copy-on-write. However, only benchmarks under full load can tell the
> truth.

That's the theory but not the reality. The reality is that modern
operating systems use a copy-on-write fork() implementation that is
quite cheap, to the point that vfork() was deprecated some time ago
already. Thus posix_spawn() (which is only a wrapper around fork)
still calls fork() and obviously cannot be cheaper than the syscalls
it relies on. Furthermore it does other things like signal manipulation
and a few other things. Thus posix_spawn() is fork+exec+stuff, while we
only do fork+exec, thus by definition we do less things. I'd have
welcome it if it were for portability reasons since the difference
should be fairly low in terms of performance, but since the argument
is performance I disagree (and portability is already granted).

> To avoid issues mentioned above, what do you think about adding a
> separate "clean" process to invoke external-checks and then update
> connection handling processes "set server"-like way? That's mostly a
> solution with backward compatibility in mind.

It's the principle of an external check server we started to develop
5 years ago or so at work, but we stopped it when facing a number of
subtle shortcomings. It was designed to be able to merge check requests
from multiple processes, and to possibly report composite results. But
at the same time checks were being improved to report more information
such as success captures or failure captures to report in logs and/or
the stats page. It's also difficult to pass certain arguments when
some elements are possibly variable and must be controlled by the
caller. It doesn't cope well with send/expect sequences for example.

I think health checks will eventually have to be completely redesigned
and external checks will possibly have to move to a non-chrooted,
independant process. But that's a minor part of a whole redesign :-/

Willy


Reply via email to