Hey Willy,

On 05/19/2014 06:30 AM, Willy Tarreau wrote:
>> Here is a nice way to trigger it:
>>  - run haproxy in deamon mode, but with nbproc 1
>>  - open a HTTP keep-alive connection (and keep it open)
>>  - send USR1 to the only worker (it should stay running, because of the
>>    ongoing connection)
>>  - open a new connection -> the worker goes to 100%
>>
>> Works all the time for me. Running lsof before and after the signal
>> neatly shows how the worker closed the listening socket but the parent
>> still has it.
> 
> I'm just confused with what the "parent" here can be since you have nbproc 1,
> so there's a single process overall. Or maybe you're using the debugger to
> breakpoint the parent process and prevent it from leaving ?
> 
> I was thinking that starting with nbproc 2 with long connections on both
> would cause the issue to happen as well.

I am not sure what the proper terminology is, here. But when you run in
daemon mode with nbproc 1 you get exactly one fork. The output of ps
looks somewhat like this:

18616 ?        S      0:00  |   \_ haproxy-systemd-wrapper
18618 ?        S      0:00  |       \_ /usr/sbin/haproxy
18619 ?        Ss     0:09  |           \_ /usr/sbin/haproxy

I was referring to 18618 as the "parent" process. It never does
anything, just wait()s.

>>> Maybe we could simply have a flag in the fdtab indicating that the fd is
>>> shared (basically all FDs opened before the fork would have it), and decide
>>> in poller.clo() to only call epoll_ctl(DEL) when this flag is set, then 
>>> clear
>>> it. It would seam reasonable in my opinion. Do you think that you could 
>>> extend
>>> your current patch to do that ?
>>
>> Good point on the performance issue. I whipped up a new patch, I hope
>> this is what you meant? I surely haven't yet fully understood all
>> architecture and design decisions, so feel free to nitpick :)
> 
> Yes, that's about the principle, though I thought about doing it slightly
> differently : in my opinion the problem is not caused by the FD being a
> listener, but by it being cloned across the fork(), eventhough for now
> there's a 1:1 relation between the two, that could change in the future
> (eg: a shared inter-process socket for stats or whatever).
> 
> I would have proceeded slightly differently :
> 
>   - in fd_insert(), I'd have added :
> 
>         fdtab[fd].cloned = 0;
> 
>   - in ev_epoll.c:_do_fork(), I'd have marked all the opened FDs as cloned :
> 
>         for (i = 0; i <= maxfd; i++)
>            if (fdtab[fd].owner)
>               fdtab[fd].cloned = 1;
> 
> That way, we'd be sure that all open FDs are properly tagged, whatever they
> are used for.
> 
>> Also, I wasn't sure if this should go into the UNIX sockets, too? Now
>> that I am writing this, I actually suppose so. Let me know if I should
>> add it there, too!
> 
> One more reason for doing the job in the poller, otherwise when adding
> new protocols, it's easy to forget about this!

Sorry, I read your suggestions somewhat too hastily. You are absolutely
right. I modified the patch accordingly, with one improvement
suggestion, though: I marked the fds  as cloned in fork_poller() instead
of evpolls _do_fork. This way, this flag is set regardless of the
protocol and the poller used and any other (future?) pollers could still
use it. Seemed more generic, what do you think? I can also put it into
_do_fork if you prefer that.

Cheers,
Conrad
diff --git a/include/proto/fd.h b/include/proto/fd.h
index 605dc21..b0a478e 100644
--- a/include/proto/fd.h
+++ b/include/proto/fd.h
@@ -335,6 +335,7 @@ static inline void fd_insert(int fd)
 	fdtab[fd].ev = 0;
 	fdtab[fd].new = 1;
 	fdtab[fd].linger_risk = 0;
+	fdtab[fd].cloned = 0;
 	if (fd + 1 > maxfd)
 		maxfd = fd + 1;
 }
diff --git a/include/types/fd.h b/include/types/fd.h
index 1c2c7c8..057d968 100644
--- a/include/types/fd.h
+++ b/include/types/fd.h
@@ -86,6 +86,7 @@ struct fdtab {
 	unsigned char new:1;                 /* 1 if this fd has just been created */
 	unsigned char updated:1;             /* 1 if this fd is already in the update list */
 	unsigned char linger_risk:1;         /* 1 if we must kill lingering before closing */
+	unsigned char cloned:1;              /* 1 if a cloned socket, requires EPOLL_CTL_DEL on close */
 };
 
 /* less often used information */
diff --git a/src/ev_epoll.c b/src/ev_epoll.c
index 2849ec6..9d359b2 100644
--- a/src/ev_epoll.c
+++ b/src/ev_epoll.c
@@ -46,6 +46,19 @@ static struct epoll_event ev;
 #endif
 
 /*
+ * Immediately remove file descriptor from epoll set upon close.
+ * Since we forked, some fds share inodes with the other process, and epoll may
+ * send us events even though this process closed the fd (see man 7 epoll,
+ * "Questions and answers", Q 6).
+ */
+REGPRM1 static void __fd_clo(int fd)
+{
+	if (unlikely(fdtab[fd].cloned)) {
+		epoll_ctl(epoll_fd, EPOLL_CTL_DEL, fd, &ev);
+	}
+}
+
+/*
  * Linux epoll() poller
  */
 REGPRM2 static void _do_poll(struct poller *p, int exp)
@@ -267,7 +280,7 @@ static void _do_register(void)
 	p->pref = 300;
 	p->private = NULL;
 
-	p->clo  = NULL;
+	p->clo  = __fd_clo;
 	p->test = _do_test;
 	p->init = _do_init;
 	p->term = _do_term;
diff --git a/src/fd.c b/src/fd.c
index 66f1e8b..c238bc8 100644
--- a/src/fd.c
+++ b/src/fd.c
@@ -438,6 +438,13 @@ int list_pollers(FILE *out)
  */
 int fork_poller()
 {
+	int fd;
+	for (fd = 0; fd <= maxfd; fd++) {
+		if (fdtab[fd].owner) {
+			fdtab[fd].cloned = 1;
+		}
+	}
+
 	if (cur_poller.fork) {
 		if (cur_poller.fork(&cur_poller))
 			return 1;

Reply via email to