Hi Willy, It seems I fixed the issues:
1. Already mentioned fix for epoll 2. Implemented posix_spawn() almost for all platform with fallback to fork() (USE_POSIX_SPAWN). - A temporary solution to close all file descriptors (not efficient) 3. Implemented proper SOCK_CLOEXEC / FD_CLOEXEC on all open sockets (USE_SOCK_CLOEXEC) You can merge the changes from: https://github.com/andvgal/haproxy/tree/20160614-fix-cloexec Comparison against master: https://github.com/andvgal/haproxy/compare/20160614-fix-cloexec?expand=1 I did as much testing as I could in scope of the problem. My original test case with 10ms inter generates some false positive by itself - that's where less than 1% of fails came from. Patches are below: >From eaa39496fd68afeff3b2cd1e4294518c3503db84 Mon Sep 17 00:00:00 2001 From: Andrey Galkin <[email protected]> Date: Tue, 14 Jun 2016 03:25:11 +0300 Subject: [PATCH 1/3] BUG/MINOR: checks: fixed to create epoll with CLOEXEC This is only part of fixed to general issue of leaking file descriptors to child external-check processes and leading to connection data corruption and disconnects. --- src/ev_epoll.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/ev_epoll.c b/src/ev_epoll.c index ccb7c33..dbfded1 100644 --- a/src/ev_epoll.c +++ b/src/ev_epoll.c @@ -176,7 +176,7 @@ REGPRM1 static int _do_init(struct poller *p) { p->private = NULL; - epoll_fd = epoll_create(global.maxsock + 1); + epoll_fd = epoll_create1(EPOLL_CLOEXEC); if (epoll_fd < 0) goto fail_fd; @@ -222,7 +222,7 @@ REGPRM1 static int _do_test(struct poller *p) { int fd; - fd = epoll_create(global.maxsock + 1); + fd = epoll_create1(EPOLL_CLOEXEC); if (fd < 0) return 0; close(fd); @@ -239,7 +239,7 @@ REGPRM1 static int _do_fork(struct poller *p) { if (epoll_fd >= 0) close(epoll_fd); - epoll_fd = epoll_create(global.maxsock + 1); + epoll_fd = epoll_create1(EPOLL_CLOEXEC); if (epoll_fd < 0) return 0; return 1; -- 2.1.4 >From fd60e2493b6df8d5bdfa610261464095b7b2b070 Mon Sep 17 00:00:00 2001 From: Andrey Galkin <[email protected]> Date: Tue, 14 Jun 2016 03:53:48 +0300 Subject: [PATCH 2/3] BUG/MEDIUM: checks: fixed to close all FDs on external-check OPTIM/MEDIUM: checks: conditional support of posix_spawn() for external-check Calling fork() is suboptimal due VM management expense. posix_spawn() is relatively thin call spawning a new process from clean image what can make significant difference in heavy deployments. posix_spawn() is controlled by new USE_POSIX_SPAWN setting which is enabled by default for newer Linux, Solaris, OSX and *BSD. --- Makefile | 13 +++++++++++++ src/checks.c | 45 +++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 58 insertions(+) diff --git a/Makefile b/Makefile index 08d6108..02037ee 100644 --- a/Makefile +++ b/Makefile @@ -37,6 +37,7 @@ # USE_TFO : enable TCP fast open. Supported on Linux >= 3.7. # USE_NS : enable network namespace support. Supported on Linux >= 2.6.24. # USE_DL : enable it if your system requires -ldl. Automatic on Linux. +# USE_POSIX_SPAWN : enable posix_spawn() support instead of fork() # USE_DEVICEATLAS : enable DeviceAtlas api. # USE_51DEGREES : enable third party device detection library from 51Degrees # @@ -283,6 +284,7 @@ ifeq ($(TARGET),linux2628) ASSUME_SPLICE_WORKS= implicit EXTRA += haproxy-systemd-wrapper USE_DL = implicit + USE_POSIX_SPAWN = implicit else ifeq ($(TARGET),solaris) # This is for Solaris 8 @@ -294,6 +296,7 @@ ifeq ($(TARGET),solaris) USE_LIBCRYPT = implicit USE_CRYPT_H = implicit USE_GETADDRINFO = implicit + USE_POSIX_SPAWN = implicit else ifeq ($(TARGET),freebsd) # This is for FreeBSD @@ -301,6 +304,7 @@ ifeq ($(TARGET),freebsd) USE_KQUEUE = implicit USE_TPROXY = implicit USE_LIBCRYPT = implicit + USE_POSIX_SPAWN = implicit else ifeq ($(TARGET),osx) # This is for Mac OS/X @@ -308,18 +312,21 @@ ifeq ($(TARGET),osx) USE_KQUEUE = implicit USE_TPROXY = implicit USE_LIBCRYPT = implicit + USE_POSIX_SPAWN = implicit else ifeq ($(TARGET),openbsd) # This is for OpenBSD >= 3.0 USE_POLL = implicit USE_KQUEUE = implicit USE_TPROXY = implicit + USE_POSIX_SPAWN = implicit else ifeq ($(TARGET),netbsd) # This is for NetBSD USE_POLL = implicit USE_KQUEUE = implicit USE_TPROXY = implicit + USE_POSIX_SPAWN = implicit else ifeq ($(TARGET),aix51) # This is for AIX 5.1 @@ -551,6 +558,12 @@ BUILD_OPTIONS += $(call ignore_implicit,USE_DL) OPTIONS_LDFLAGS += -ldl endif +ifneq ($(USE_POSIX_SPAWN),) +OPTIONS_CFLAGS += -DUSE_POSIX_SPAWN +BUILD_OPTIONS += $(call ignore_implicit,USE_POSIX_SPAWN) +endif + + # report DLMALLOC_SRC only if explicitly specified ifneq ($(DLMALLOC_SRC),) BUILD_OPTIONS += DLMALLOC_SRC=$(DLMALLOC_SRC) diff --git a/src/checks.c b/src/checks.c index c4ac947..3df25eb 100644 --- a/src/checks.c +++ b/src/checks.c @@ -22,6 +22,9 @@ #include <string.h> #include <time.h> #include <unistd.h> +#ifdef USE_POSIX_SPAWN +#include <spawn.h> +#endif /* USE_POSIX_SPAWN */ #include <sys/socket.h> #include <sys/types.h> #include <sys/wait.h> @@ -1821,10 +1824,43 @@ static int connect_proc_chk(struct task *t) struct proxy *px = s->proxy; int status; pid_t pid; + int fd; + /* In ideal situation, we should close all sockets with CLOEXEC. + * Unfortunately, tests show that closing only stdin/stdout/stderr does not help. + */ + int fd_limit = maxfd; +#ifdef USE_POSIX_SPAWN + int spawn_res; + posix_spawn_file_actions_t spawn_file_actions; +#endif /* USE_POSIX_SPAWN */ status = SF_ERR_RESOURCE; block_sigchld(); + +#ifdef USE_POSIX_SPAWN + /* Should not be a problem */ + extchk_setenv(check, EXTCHK_HAPROXY_SERVER_CURCONN, ultoa_r(s->cur_sess, buf, sizeof(buf))); + + /* The correct way is to open sockets with SOCK_CLOEXEC and/or set FD_CLOEXEC later. + This code should be removed when all descriptors get proper CLOEXEC. */ + posix_spawn_file_actions_init(&spawn_file_actions); + + for (fd = 0; fd <= fd_limit; ++fd) { + posix_spawn_file_actions_addclose(&spawn_file_actions, fd); + } + + spawn_res = posix_spawn(&pid, px->check_command, &spawn_file_actions, NULL, check->argv, check->envp); + posix_spawn_file_actions_destroy(&spawn_file_actions); + + if (spawn_res != 0) { + Alert("Failed to fork process for external health check: %s. Aborting.\n", + strerror(spawn_res)); + set_server_check_status(check, HCHK_STATUS_SOCKERR, strerror(spawn_res)); + goto out; + } + +#else /* USE_POSIX_SPAWN */ pid = fork(); if (pid < 0) { @@ -1836,6 +1872,13 @@ static int connect_proc_chk(struct task *t) if (pid == 0) { /* Child */ extern char **environ; + + /* The correct way is to open sockets with SOCK_CLOEXEC and/or set FD_CLOEXEC later. + This code should be removed when all descriptors get proper CLOEXEC. */ + for (fd = 0; fd <= fd_limit; ++fd) { + close(fd); + } + environ = check->envp; extchk_setenv(check, EXTCHK_HAPROXY_SERVER_CURCONN, ultoa_r(s->cur_sess, buf, sizeof(buf))); execvp(px->check_command, check->argv); @@ -1844,6 +1887,8 @@ static int connect_proc_chk(struct task *t) exit(-1); } +#endif /* USE_POSIX_SPAWN */ + /* Parent */ if (check->result == CHK_RES_UNKNOWN) { if (pid_list_add(pid, t) != NULL) { -- 2.1.4 >From 4949f2db93beb0d3b0a28a8dd0bffb0a226ba2ef Mon Sep 17 00:00:00 2001 From: Andrey Galkin <[email protected]> Date: Tue, 14 Jun 2016 06:23:16 +0300 Subject: [PATCH 3/3] BUG/MEDIUM: checks/polls: implemented CLOEXEC on all FDs Implemented efficient SOCK_CLOEXEC on socket() and accept() for Linux. Implemented conditional fcntl() for other platforms. --- Makefile | 8 ++++++++ include/common/accept4.h | 4 ++++ include/proto/fd.h | 25 +++++++++++++++++++++++++ src/checks.c | 23 +---------------------- src/dns.c | 5 ++++- src/listener.c | 13 ++++++++++--- src/log.c | 6 +++++- src/namespace.c | 7 ++++++- src/proto_uxst.c | 18 +++++++++++++++--- 9 files changed, 78 insertions(+), 31 deletions(-) diff --git a/Makefile b/Makefile index 02037ee..775e257 100644 --- a/Makefile +++ b/Makefile @@ -38,6 +38,7 @@ # USE_NS : enable network namespace support. Supported on Linux >= 2.6.24. # USE_DL : enable it if your system requires -ldl. Automatic on Linux. # USE_POSIX_SPAWN : enable posix_spawn() support instead of fork() +# USE_SOCK_CLOEXEC : enable use of SOCK_CLOEXEC in socket() and accept() calls # USE_DEVICEATLAS : enable DeviceAtlas api. # USE_51DEGREES : enable third party device detection library from 51Degrees # @@ -267,6 +268,7 @@ ifeq ($(TARGET),linux26) USE_FUTEX = implicit EXTRA += haproxy-systemd-wrapper USE_DL = implicit + SOCK_CLOEXEC = implicit else ifeq ($(TARGET),linux2628) # This is for standard Linux >= 2.6.28 with netfilter, epoll, tproxy and splice @@ -285,6 +287,7 @@ ifeq ($(TARGET),linux2628) EXTRA += haproxy-systemd-wrapper USE_DL = implicit USE_POSIX_SPAWN = implicit + USE_SOCK_CLOEXEC = implicit else ifeq ($(TARGET),solaris) # This is for Solaris 8 @@ -563,6 +566,11 @@ OPTIONS_CFLAGS += -DUSE_POSIX_SPAWN BUILD_OPTIONS += $(call ignore_implicit,USE_POSIX_SPAWN) endif +ifneq ($(USE_SOCK_CLOEXEC),) +OPTIONS_CFLAGS += -DUSE_SOCK_CLOEXEC +BUILD_OPTIONS += $(call ignore_implicit,USE_SOCK_CLOEXEC) +endif + # report DLMALLOC_SRC only if explicitly specified ifneq ($(DLMALLOC_SRC),) diff --git a/include/common/accept4.h b/include/common/accept4.h index 8981b92..cee84d9 100644 --- a/include/common/accept4.h +++ b/include/common/accept4.h @@ -40,6 +40,10 @@ #ifndef SOCK_NONBLOCK #define SOCK_NONBLOCK O_NONBLOCK #endif +#ifndef SOCK_CLOEXEC +#define SOCK_CLOEXEC O_CLOEXEC +#endif + #if defined(USE_MY_ACCEPT4) || (!defined(SYS_ACCEPT4) && !defined(__NR_accept4)) #if defined(CONFIG_HAP_LINUX_VSYSCALL) && defined(__linux__) && defined(__i386__) diff --git a/include/proto/fd.h b/include/proto/fd.h index 87309bf..01ef803 100644 --- a/include/proto/fd.h +++ b/include/proto/fd.h @@ -26,6 +26,7 @@ #include <sys/time.h> #include <sys/types.h> #include <unistd.h> +#include <fcntl.h> #include <common/config.h> #include <types/fd.h> @@ -339,6 +340,30 @@ static inline void fd_insert(int fd) maxfd = fd + 1; } +/* Set FD_CLOEXEC on file descriptor to avoid its leakage into children */ +static inline void fd_set_cloexec(int fd) +{ + fcntl(fd, F_SETFD, FD_CLOEXEC); +} + +/* Add SOCK_CLOEXEC, if enabled */ +static inline int sock_type_cloexec(int sock_type) +{ +#ifdef USE_SOCK_CLOEXEC + return sock_type|SOCK_CLOEXEC; +#else /* USE_SOCK_CLOEXEC */ + return sock_type; +#endif /* USE_SOCK_CLOEXEC */ +} + +/* Call fd_set_cloexec, SOCK_CLOEXEC is not enabled*/ +static inline void fd_cond_set_cloexec(int fd) +{ +#ifndef USE_SOCK_CLOEXEC + fd_set_cloexec(fd); +#endif /* USE_SOCK_CLOEXEC */ +} + #endif /* _PROTO_FD_H */ diff --git a/src/checks.c b/src/checks.c index 3df25eb..5354ac6 100644 --- a/src/checks.c +++ b/src/checks.c @@ -1824,14 +1824,8 @@ static int connect_proc_chk(struct task *t) struct proxy *px = s->proxy; int status; pid_t pid; - int fd; - /* In ideal situation, we should close all sockets with CLOEXEC. - * Unfortunately, tests show that closing only stdin/stdout/stderr does not help. - */ - int fd_limit = maxfd; #ifdef USE_POSIX_SPAWN int spawn_res; - posix_spawn_file_actions_t spawn_file_actions; #endif /* USE_POSIX_SPAWN */ status = SF_ERR_RESOURCE; @@ -1842,16 +1836,7 @@ static int connect_proc_chk(struct task *t) /* Should not be a problem */ extchk_setenv(check, EXTCHK_HAPROXY_SERVER_CURCONN, ultoa_r(s->cur_sess, buf, sizeof(buf))); - /* The correct way is to open sockets with SOCK_CLOEXEC and/or set FD_CLOEXEC later. - This code should be removed when all descriptors get proper CLOEXEC. */ - posix_spawn_file_actions_init(&spawn_file_actions); - - for (fd = 0; fd <= fd_limit; ++fd) { - posix_spawn_file_actions_addclose(&spawn_file_actions, fd); - } - - spawn_res = posix_spawn(&pid, px->check_command, &spawn_file_actions, NULL, check->argv, check->envp); - posix_spawn_file_actions_destroy(&spawn_file_actions); + spawn_res = posix_spawn(&pid, px->check_command, NULL, NULL, check->argv, check->envp); if (spawn_res != 0) { Alert("Failed to fork process for external health check: %s. Aborting.\n", @@ -1873,12 +1858,6 @@ static int connect_proc_chk(struct task *t) /* Child */ extern char **environ; - /* The correct way is to open sockets with SOCK_CLOEXEC and/or set FD_CLOEXEC later. - This code should be removed when all descriptors get proper CLOEXEC. */ - for (fd = 0; fd <= fd_limit; ++fd) { - close(fd); - } - environ = check->envp; extchk_setenv(check, EXTCHK_HAPROXY_SERVER_CURCONN, ultoa_r(s->cur_sess, buf, sizeof(buf))); execvp(px->check_command, check->argv); diff --git a/src/dns.c b/src/dns.c index 3b3dfc5..cc35ea7 100644 --- a/src/dns.c +++ b/src/dns.c @@ -918,13 +918,16 @@ int dns_init_resolvers(void) dgram->data = &resolve_dgram_cb; /* create network UDP socket for this nameserver */ - if ((fd = socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP)) == -1) { + if ((fd = socket(AF_INET, sock_type_cloexec(SOCK_DGRAM), IPPROTO_UDP)) == -1) { Alert("Starting [%s/%s] nameserver: can't create socket.\n", curr_resolvers->id, curnameserver->id); free(dgram); dgram = NULL; return 0; } + + /* avoid the socket getting affected by child processes */ + fd_cond_set_cloexec(fd); /* "connect" the UDP socket to the name server IP */ if (connect(fd, (struct sockaddr*)&curnameserver->addr, get_addr_len(&curnameserver->addr)) == -1) { diff --git a/src/listener.c b/src/listener.c index 59385f0..7f0e8d6 100644 --- a/src/listener.c +++ b/src/listener.c @@ -387,12 +387,19 @@ void listener_accept(int fd) * fallback to the legacy accept() + fcntl(). */ if (unlikely(accept4_broken || - ((cfd = accept4(fd, (struct sockaddr *)&addr, &laddr, SOCK_NONBLOCK)) == -1 && + ((cfd = accept4(fd, (struct sockaddr *)&addr, &laddr, SOCK_NONBLOCK|SOCK_CLOEXEC)) == -1 && (errno == ENOSYS || errno == EINVAL || errno == EBADF) && - (accept4_broken = 1)))) + (accept4_broken = 1)))) { #endif - if ((cfd = accept(fd, (struct sockaddr *)&addr, &laddr)) != -1) + if ((cfd = accept(fd, (struct sockaddr *)&addr, &laddr)) != -1) { + fd_set_cloexec(cfd); fcntl(cfd, F_SETFL, O_NONBLOCK); + } +#ifdef USE_ACCEPT4 + } else { + fd_cond_set_cloexec(cfd); + } +#endif if (unlikely(cfd == -1)) { switch (errno) { diff --git a/src/log.c b/src/log.c index 99c89f3..45dae0b 100644 --- a/src/log.c +++ b/src/log.c @@ -31,6 +31,7 @@ #include <types/global.h> #include <types/log.h> +#include <proto/fd.h> #include <proto/frontend.h> #include <proto/proto_http.h> #include <proto/log.h> @@ -1103,12 +1104,15 @@ void __send_log(struct proxy *p, int level, char *message, size_t size, char *sd if (unlikely(*plogfd < 0)) { /* socket not successfully initialized yet */ int proto = logsrv->addr.ss_family == AF_UNIX ? 0 : IPPROTO_UDP; + int sock_type = sock_type_cloexec(SOCK_DGRAM); - if ((*plogfd = socket(logsrv->addr.ss_family, SOCK_DGRAM, proto)) < 0) { + if ((*plogfd = socket(logsrv->addr.ss_family, sock_type, proto)) < 0) { Alert("socket for logger #%d failed: %s (errno=%d)\n", nblogger, strerror(errno), errno); continue; } + /* avoid the socket getting affected by child processes */ + fd_cond_set_cloexec(*plogfd); /* we don't want to receive anything on this socket */ setsockopt(*plogfd, SOL_SOCKET, SO_RCVBUF, &zero, sizeof(zero)); /* does nothing under Linux, maybe needed for others */ diff --git a/src/namespace.c b/src/namespace.c index e5ebfd7..4f078af 100644 --- a/src/namespace.c +++ b/src/namespace.c @@ -14,6 +14,7 @@ #include <common/compiler.h> #include <common/hash.h> #include <common/errors.h> +#include <proto/fd.h> #include <proto/log.h> #include <types/global.h> @@ -101,7 +102,11 @@ int my_socketat(const struct netns_entry *ns, int domain, int type, int protocol if (default_namespace >= 0 && ns && setns(ns->fd, CLONE_NEWNET) == -1) return -1; #endif - sock = socket(domain, type, protocol); + sock = socket(domain, sock_type_cloexec(type), protocol); + + /* avoid the socket getting affected by child processes */ + fd_cond_set_cloexec(sock); + #ifdef CONFIG_HAP_NS if (default_namespace >= 0 && ns && setns(default_namespace, CLONE_NEWNET) == -1) { diff --git a/src/proto_uxst.c b/src/proto_uxst.c index b2a7fe2..537b6ce 100644 --- a/src/proto_uxst.c +++ b/src/proto_uxst.c @@ -127,10 +127,13 @@ static void destroy_uxst_socket(const char *path) * ECONNREFUSED. In this case, we do not touch it because it's used * by some other process. */ - sock = socket(PF_UNIX, SOCK_DGRAM, 0); + sock = socket(PF_UNIX, sock_type_cloexec(SOCK_DGRAM), 0); if (sock < 0) return; + /* avoid the socket getting affected by child processes */ + fd_cond_set_cloexec(sock); + addr.sun_family = AF_UNIX; strncpy(addr.sun_path, path, sizeof(addr.sun_path)); addr.sun_path[sizeof(addr.sun_path) - 1] = 0; @@ -237,13 +240,16 @@ static int uxst_bind_listener(struct listener *listener, char *errmsg, int errle } addr.sun_family = AF_UNIX; - fd = socket(PF_UNIX, SOCK_STREAM, 0); + fd = socket(PF_UNIX, sock_type_cloexec(SOCK_STREAM), 0); if (fd < 0) { err |= ERR_FATAL | ERR_ALERT; msg = "cannot create UNIX socket"; goto err_unlink_back; } + /* avoid the socket getting affected by child processes */ + fd_cond_set_cloexec(fd); + fd_ready: if (fd >= global.maxsock) { err |= ERR_FATAL | ERR_ALERT; @@ -415,6 +421,7 @@ int uxst_pause_listener(struct listener *l) int uxst_connect_server(struct connection *conn, int data, int delack) { int fd; + int sock_type; struct server *srv; struct proxy *be; @@ -434,7 +441,9 @@ int uxst_connect_server(struct connection *conn, int data, int delack) return SF_ERR_INTERNAL; } - if ((fd = conn->t.sock.fd = socket(PF_UNIX, SOCK_STREAM, 0)) == -1) { + sock_type = sock_type_cloexec(SOCK_STREAM); + + if ((fd = conn->t.sock.fd = socket(PF_UNIX, sock_type, 0)) == -1) { qfprintf(stderr, "Cannot get a server socket.\n"); if (errno == ENFILE) { @@ -466,6 +475,9 @@ int uxst_connect_server(struct connection *conn, int data, int delack) return SF_ERR_RESOURCE; } + /* avoid the socket getting affected by child processes */ + fd_cond_set_cloexec(fd); + if (fd >= global.maxsock) { /* do not log anything there, it's a normal condition when this option * is used to serialize connections to a server ! -- 2.1.4 Thanks, Andrey Galkin On Tue, Jun 14, 2016 at 1:55 AM, Andrey G. <[email protected]> wrote: > Hi Willy, > > I believe your workaround fits socket data pollution, but it does not > really help with epoll connection drop. However, instead of forcibly > closing the sockets on each fork(), perhaps, setting FD_CLOEXEC right > after accept()/socket() is much safer and potentially better > performing with large number of descriptors and external-checks. > Alternative way, is to try to use posix_spawn(). > > Also,I have noticed around 0.6% of dropped connections which can be a > consequence of race condition on close() in child. I'll try to look > more. > > Regarding, the epoll connection drop, it seems the issue comes from > epoll FD being shared with child process and therefore affected by > changes in it. > Below is patch which fixes the issues remaining after close() > workaround (at least in my tests). Note: maxsock is unused parameter - > it is used to be utilized only as a hint in pre 2.6.8 kernels. > > Just in case, I contribute the patch according to LICENSE and yes, I > am a bit violating CONTRIBUTING rules. Sorry :( > NOTE: it also requires a separate socket data pollution fix as well! > --- haproxy-1.6.5.orig/src/ev_epoll.c 2016-05-10 16:42:00.000000000 +0300 > +++ haproxy-1.6.5/src/ev_epoll.c 2016-06-14 01:21:45.957098610 +0300 > @@ -176,7 +176,7 @@ REGPRM1 static int _do_init(struct polle > { > p->private = NULL; > > - epoll_fd = epoll_create(global.maxsock + 1); > + epoll_fd = epoll_create1(EPOLL_CLOEXEC); > if (epoll_fd < 0) > goto fail_fd; > > @@ -222,7 +222,7 @@ REGPRM1 static int _do_test(struct polle > { > int fd; > > - fd = epoll_create(global.maxsock + 1); > + fd = epoll_create1(EPOLL_CLOEXEC); > if (fd < 0) > return 0; > close(fd); > @@ -239,7 +239,7 @@ REGPRM1 static int _do_fork(struct polle > { > if (epoll_fd >= 0) > close(epoll_fd); > - epoll_fd = epoll_create(global.maxsock + 1); > + epoll_fd = epoll_create1(EPOLL_CLOEXEC); > if (epoll_fd < 0) > return 0; > return 1; > > Thanks, > Andrey Galkin > Андрей > > > On Mon, Jun 13, 2016 at 10:35 PM, Willy Tarreau <[email protected]> wrote: >> Hi Andrey, >> >> On Sun, Jun 12, 2016 at 04:15:49PM +0300, Andrey G. wrote: >>> Hi, >>> >>> I am seeing quite strange and not simple to reproduce behavior found >>> during stress testing of https://github.com/codingfuture/puppet-cfdb . >>> After a long testing with different options, I am quite sure the issue >>> comes from epoll support as "noepoll" fixes the problem. >>> >>> Here's the scenario: >>> >>> 1. There are several DB clusters accessible through TCP >>> 2. HAProxy 1.6.5 on Debian Jessie accepts clients on UNIX sockets >>> under /run/{service}/ (easier auto-configuration, security and per >>> role maxconn control) >>> 3. external-check python-based (yep..) scripts are used to properly >>> identify ready nodes in clusters >>> 4. As cluster connections may be wrapped in TLS, external-check >>> scripts use dedicated no-check "listen" blocks in HAProxy, but do not >>> connect directly. >>> 5. As there is a known recently discovered issue with stdout/stderr >>> data clobbering, I made sure external-check scripts to close mentioned >>> descriptors (the example below uses /dev/null instead). >> >> I think the issue is very similar. In fact, closing stdin/stdout is not >> enough, we need to close *all* FDs there. epoll uses a specific FD and >> I suspect that it is what is causing the issue. An alternate reason could >> be that its presence simply shifts another FD by 1 and makes it conflict >> with one FD used by the external check. Normal scripts should not assume >> that fds above 2 may be used without initialization. But note that it is >> also possible that epoll is assigned fd #2 and causes all the trouble. >> >> Thus now I'd do this after the fork for external checks instead in >> src/checks.c to guarantee we close all open FDs : >> >> if (pid == 0) { >> /* Child */ >> extern char **environ; >> + int fd; >> + >> + for (fd = 0; fd <= maxfd; fd++) >> + close(fd); >> >> Is it something you could check ? By the way, thanks a lot for your bug >> report, it's very well detailed! >> >> Willy

