Hi everyone,
I am still somewhat new to haproxy, so I maybe missing a few bits and
pieces here. If so, don't hesitate to educate me :)
First of all, we are using HAProxy here at SoundCloud, so a big thanks
to everyone who invested time in this wonderful project!
We are very keen on using SSL termination with HAProxy, so we have been
running the 1.5-dev builds on non-critical infrastructure. We kept
running into one problem though. Sorry for this rather long description,
put reproduction is kind of difficult, so I want to mention everything
that might be related.
We use "nbproc 12". To restart HAProxy, we send USR1 to the worker
processes. Usually, as expected, they stop accepting connections, shut
down once all connections are terminated, then the parent process
terminates and gets restarted by the systemd-wrapper (I am aware we
could/should send USR2 to the systemd-wrapper, but I think the use case
is still valid).
If haproxy has been running for a certain amount of time (this part
makes reproduction cumbersome), we quite often run into the following
issue when restarting haproxy: if some of the workers have active
connections that they must wait for to be terminated, a fraction of
those would start hogging CPU like crazy. Attaching a debugger showed
that epoll_wait() would immediately return with events that they did not
process because they had already closed the file descriptor (and thus
owner == 0x0). This led me to the epoll(7) man page [1], which in its
Q&A section states for Q6 that just closing a fd may not be sufficient
if the inode referred to by that fd is still open in another process,
which is the case when you fork() with open fds like haproxy does.
The scenario I constructed from this information (and here I am still
missing some knowledge) is that if a processes has a connection going on
when the signal hits, it can happen that it closes a file descriptor
that it doesn't have any connections on but then, while polling for
events on the remaining connection, receives events for the closed fd
because that one is still open in another process. Sorry for this being
kind of vague, but I am still trying to understand the relevant code paths.
However, based on the above assumption, I came up with the attached
patch as a remedy. Basically it just implements the poller.clo(fd)
handler for epoll, which is already implemented for other pollers. I
can't say with certainty, but while we had bug hit rate of at least 80%
before, I have not been able to trigger the bug in 5 attempts with the
patch applied, so I currently think it actually works.
I would love to hear some feedback on this issue, maybe someone can give
insight on the actual code path that might be at hand here, or possibly
any side effects this patch might introduce that I have not thought of, etc.
Thanks a lot and keep being awesome!
Conrad
[1] http://linux.die.net/man/7/epoll
diff --git a/src/ev_epoll.c b/src/ev_epoll.c
index 2849ec6..9a704a3 100644
--- a/src/ev_epoll.c
+++ b/src/ev_epoll.c
@@ -46,6 +46,17 @@ static struct epoll_event ev;
#endif
/*
+ * Immediately remove file descriptor from epoll set upon close.
+ * Since we forked, some fds share inodes with the other process, and epoll may
+ * send us events even though this process closed the fd (see man 7 epoll,
+ * "Questions and answers", Q 6).
+ */
+REGPRM1 static void __fd_clo(int fd)
+{
+ epoll_ctl(epoll_fd, EPOLL_CTL_DEL, fd, &ev);
+}
+
+/*
* Linux epoll() poller
*/
REGPRM2 static void _do_poll(struct poller *p, int exp)
@@ -267,7 +278,7 @@ static void _do_register(void)
p->pref = 300;
p->private = NULL;
- p->clo = NULL;
+ p->clo = __fd_clo;
p->test = _do_test;
p->init = _do_init;
p->term = _do_term;