Hi Willy,

On 05/17/2014 04:39 AM, Willy Tarreau wrote:
> On 05/15/2014 11:26 AM, Conrad Hoffmann wrote:
>> If haproxy has been running for a certain amount of time (this part
>> makes reproduction cumbersome), we quite often run into the following
>> issue when restarting haproxy: if some of the workers have active
>> connections that they must wait for to be terminated, a fraction of
>> those would start hogging CPU like crazy.
> 
> I had already observed that case myself and thought it was gone because
> as you explain it's very hard to reproduce.

After sleeping a night over it, I came up with a sure-fire way to
reproduce it. See below.

>> Attaching a debugger showed
>> that epoll_wait() would immediately return with events that they did not
>> process because they had already closed the file descriptor (and thus
>> owner == 0x0).
> 
> Could you please confirm that it happens with the listening sockets ?

Yes. I ran lsof on all processes before sending USR1, then attached a
debugger the one gone haywire and checked the fd in the epoll event. It
was one of the original listening sockets.

Here is a nice way to trigger it:
 - run haproxy in deamon mode, but with nbproc 1
 - open a HTTP keep-alive connection (and keep it open)
 - send USR1 to the only worker (it should stay running, because of the
   ongoing connection)
 - open a new connection -> the worker goes to 100%

Works all the time for me. Running lsof before and after the signal
neatly shows how the worker closed the listening socket but the parent
still has it.

>> However, based on the above assumption, I came up with the attached
>> patch as a remedy. Basically it just implements the poller.clo(fd)
>> handler for epoll, which is already implemented for other pollers. I
>> can't say with certainty, but while we had bug hit rate of at least 80%
>> before, I have not been able to trigger the bug in 5 attempts with the
>> patch applied, so I currently think it actually works.
> 
> I'm absolutely sure that your patch fixes the issue because it removes the
> event before getting rid of the reference to the FD. However I have a problem
> with this approach, it will add an extra syscall to *every* connection, which
> is quite noticeable in a number of environments. Ideally we should extend the
> poller's API to have a "close-shared" function or something like this which
> would be used only for listening FDs that are shared between processes.
> 
> Maybe we could simply have a flag in the fdtab indicating that the fd is
> shared (basically all FDs opened before the fork would have it), and decide
> in poller.clo() to only call epoll_ctl(DEL) when this flag is set, then clear
> it. It would seam reasonable in my opinion. Do you think that you could extend
> your current patch to do that ?

Good point on the performance issue. I whipped up a new patch, I hope
this is what you meant? I surely haven't yet fully understood all
architecture and design decisions, so feel free to nitpick :)

Also, I wasn't sure if this should go into the UNIX sockets, too? Now
that I am writing this, I actually suppose so. Let me know if I should
add it there, too!

Enjoy your weekend,
Conrad
diff --git a/include/types/fd.h b/include/types/fd.h
index 1c2c7c8..12d5b6b 100644
--- a/include/types/fd.h
+++ b/include/types/fd.h
@@ -86,6 +86,7 @@ struct fdtab {
 	unsigned char new:1;                 /* 1 if this fd has just been created */
 	unsigned char updated:1;             /* 1 if this fd is already in the update list */
 	unsigned char linger_risk:1;         /* 1 if we must kill lingering before closing */
+	unsigned char listener:1;            /* 1 if a listening socket, requires EPOLL_CTL_DEL on close */
 };
 
 /* less often used information */
diff --git a/src/ev_epoll.c b/src/ev_epoll.c
index 2849ec6..46e5f30 100644
--- a/src/ev_epoll.c
+++ b/src/ev_epoll.c
@@ -46,6 +46,19 @@ static struct epoll_event ev;
 #endif
 
 /*
+ * Immediately remove file descriptor from epoll set upon close.
+ * Since we forked, some fds share inodes with the other process, and epoll may
+ * send us events even though this process closed the fd (see man 7 epoll,
+ * "Questions and answers", Q 6).
+ */
+REGPRM1 static void __fd_clo(int fd)
+{
+	if (unlikely(fdtab[fd].listener)) {
+		epoll_ctl(epoll_fd, EPOLL_CTL_DEL, fd, &ev);
+	}
+}
+
+/*
  * Linux epoll() poller
  */
 REGPRM2 static void _do_poll(struct poller *p, int exp)
@@ -267,7 +280,7 @@ static void _do_register(void)
 	p->pref = 300;
 	p->private = NULL;
 
-	p->clo  = NULL;
+	p->clo  = __fd_clo;
 	p->test = _do_test;
 	p->init = _do_init;
 	p->term = _do_term;
diff --git a/src/proto_tcp.c b/src/proto_tcp.c
index a672de4..8d4acaf 100644
--- a/src/proto_tcp.c
+++ b/src/proto_tcp.c
@@ -882,6 +882,7 @@ int tcp_bind_listener(struct listener *listener, char *errmsg, int errlen)
 
 	fdtab[fd].owner = listener; /* reference the listener instead of a task */
 	fdtab[fd].iocb = listener->proto->accept;
+	fdtab[fd].listener = 1; /* epoll poller needs to know this */
 	fd_insert(fd);
 
  tcp_return:

Reply via email to