Hi Andrey,

On Tue, Jun 14, 2016 at 06:39:19AM +0300, Andrey Galkin wrote:
> Hi Willy,
> 
> It seems I fixed the issues:
> 
> 1. Already mentioned fix for epoll
> 2. Implemented posix_spawn() almost for all platform with fallback to
> fork() (USE_POSIX_SPAWN).
>    - A temporary solution to close all file descriptors (not efficient)
> 3. Implemented proper SOCK_CLOEXEC / FD_CLOEXEC on all open sockets
> (USE_SOCK_CLOEXEC)

I finally found some time to review your work, thought a bit about it and
checked some of the portability stuff and overall I'd rather not fix it
this way for several reasons :

1) epoll_create1() was only brought in kernel 2.6.27, so this means that
   on earlier versions we'll still need to perform the close by hand.

2) posix_spawn() doesn't provide *any* benefit over fork()+execve(), it's
   just a libc-provided wrapper which is supposed to also work on minimal
   systems which do not have a working fork() implementation. All the systems
   we support do have a working fork() since it's used early to daemonize the
   process, thus their posix_spawn() simply is fork+execve.

3) there are actually two problems with CLOEXEC. The first one is that it
   requires to be done *everywhere* and that it is quite hard to ensure none
   was missed. For example in your case you missed the pipe() calls. If we
   later implement fd passing using SCM_RIGHTS, there's a risk to miss it
   as well. Also we rely on extra libs like openssl or lua, and you don't
   know whether they have their own FDs (eg: for /dev/random or /dev/crypto).
   The second issue is the performance impact : on all systems where there's
   no CLOEXEC flag available on all syscalls, an extra fcntl() syscall will
   be needed with every socket creation, thus on accept() and connect(). We
   already spend most of our time in syscalls, and slowing down every syscall
   so that an execve() performed once in a while doesn't have to close its fds
   is not a good balance in my opinion.

4) relying on two completely different behaviours depending on the operating
   system combinations will just add more trouble and make bug reports more
   complicated to troubleshoot.

5) it doesn't protect against any FDs inherited from any calling process, so
   it's still not 100% safe as-is.

I ran some performance tests on the close() loop. On my machine I can close
about 17M FD per second, which means that when you run with 100k sockets,
you can perform 170 fork/exec per second, which is already huge. Also people
who implement external checks know that it comes with a cost. For all of
these reasons I'd rather simply close all FDs after the fork(). On some
systems (BSD, Solaris), we may even later switch to closefrom() which does
exactly this.

Regards,
Willy

Reply via email to