Hey Willy,
On 05/19/2014 06:30 AM, Willy Tarreau wrote:
>> Here is a nice way to trigger it:
>> - run haproxy in deamon mode, but with nbproc 1
>> - open a HTTP keep-alive connection (and keep it open)
>> - send USR1 to the only worker (it should stay running, because of the
>> ongoing connection)
>> - open a new connection -> the worker goes to 100%
>>
>> Works all the time for me. Running lsof before and after the signal
>> neatly shows how the worker closed the listening socket but the parent
>> still has it.
>
> I'm just confused with what the "parent" here can be since you have nbproc 1,
> so there's a single process overall. Or maybe you're using the debugger to
> breakpoint the parent process and prevent it from leaving ?
>
> I was thinking that starting with nbproc 2 with long connections on both
> would cause the issue to happen as well.
I am not sure what the proper terminology is, here. But when you run in
daemon mode with nbproc 1 you get exactly one fork. The output of ps
looks somewhat like this:
18616 ? S 0:00 | \_ haproxy-systemd-wrapper
18618 ? S 0:00 | \_ /usr/sbin/haproxy
18619 ? Ss 0:09 | \_ /usr/sbin/haproxy
I was referring to 18618 as the "parent" process. It never does
anything, just wait()s.
>>> Maybe we could simply have a flag in the fdtab indicating that the fd is
>>> shared (basically all FDs opened before the fork would have it), and decide
>>> in poller.clo() to only call epoll_ctl(DEL) when this flag is set, then
>>> clear
>>> it. It would seam reasonable in my opinion. Do you think that you could
>>> extend
>>> your current patch to do that ?
>>
>> Good point on the performance issue. I whipped up a new patch, I hope
>> this is what you meant? I surely haven't yet fully understood all
>> architecture and design decisions, so feel free to nitpick :)
>
> Yes, that's about the principle, though I thought about doing it slightly
> differently : in my opinion the problem is not caused by the FD being a
> listener, but by it being cloned across the fork(), eventhough for now
> there's a 1:1 relation between the two, that could change in the future
> (eg: a shared inter-process socket for stats or whatever).
>
> I would have proceeded slightly differently :
>
> - in fd_insert(), I'd have added :
>
> fdtab[fd].cloned = 0;
>
> - in ev_epoll.c:_do_fork(), I'd have marked all the opened FDs as cloned :
>
> for (i = 0; i <= maxfd; i++)
> if (fdtab[fd].owner)
> fdtab[fd].cloned = 1;
>
> That way, we'd be sure that all open FDs are properly tagged, whatever they
> are used for.
>
>> Also, I wasn't sure if this should go into the UNIX sockets, too? Now
>> that I am writing this, I actually suppose so. Let me know if I should
>> add it there, too!
>
> One more reason for doing the job in the poller, otherwise when adding
> new protocols, it's easy to forget about this!
Sorry, I read your suggestions somewhat too hastily. You are absolutely
right. I modified the patch accordingly, with one improvement
suggestion, though: I marked the fds as cloned in fork_poller() instead
of evpolls _do_fork. This way, this flag is set regardless of the
protocol and the poller used and any other (future?) pollers could still
use it. Seemed more generic, what do you think? I can also put it into
_do_fork if you prefer that.
Cheers,
Conrad
diff --git a/include/proto/fd.h b/include/proto/fd.h
index 605dc21..b0a478e 100644
--- a/include/proto/fd.h
+++ b/include/proto/fd.h
@@ -335,6 +335,7 @@ static inline void fd_insert(int fd)
fdtab[fd].ev = 0;
fdtab[fd].new = 1;
fdtab[fd].linger_risk = 0;
+ fdtab[fd].cloned = 0;
if (fd + 1 > maxfd)
maxfd = fd + 1;
}
diff --git a/include/types/fd.h b/include/types/fd.h
index 1c2c7c8..057d968 100644
--- a/include/types/fd.h
+++ b/include/types/fd.h
@@ -86,6 +86,7 @@ struct fdtab {
unsigned char new:1; /* 1 if this fd has just been created */
unsigned char updated:1; /* 1 if this fd is already in the update list */
unsigned char linger_risk:1; /* 1 if we must kill lingering before closing */
+ unsigned char cloned:1; /* 1 if a cloned socket, requires EPOLL_CTL_DEL on close */
};
/* less often used information */
diff --git a/src/ev_epoll.c b/src/ev_epoll.c
index 2849ec6..9d359b2 100644
--- a/src/ev_epoll.c
+++ b/src/ev_epoll.c
@@ -46,6 +46,19 @@ static struct epoll_event ev;
#endif
/*
+ * Immediately remove file descriptor from epoll set upon close.
+ * Since we forked, some fds share inodes with the other process, and epoll may
+ * send us events even though this process closed the fd (see man 7 epoll,
+ * "Questions and answers", Q 6).
+ */
+REGPRM1 static void __fd_clo(int fd)
+{
+ if (unlikely(fdtab[fd].cloned)) {
+ epoll_ctl(epoll_fd, EPOLL_CTL_DEL, fd, &ev);
+ }
+}
+
+/*
* Linux epoll() poller
*/
REGPRM2 static void _do_poll(struct poller *p, int exp)
@@ -267,7 +280,7 @@ static void _do_register(void)
p->pref = 300;
p->private = NULL;
- p->clo = NULL;
+ p->clo = __fd_clo;
p->test = _do_test;
p->init = _do_init;
p->term = _do_term;
diff --git a/src/fd.c b/src/fd.c
index 66f1e8b..c238bc8 100644
--- a/src/fd.c
+++ b/src/fd.c
@@ -438,6 +438,13 @@ int list_pollers(FILE *out)
*/
int fork_poller()
{
+ int fd;
+ for (fd = 0; fd <= maxfd; fd++) {
+ if (fdtab[fd].owner) {
+ fdtab[fd].cloned = 1;
+ }
+ }
+
if (cur_poller.fork) {
if (cur_poller.fork(&cur_poller))
return 1;