Hi Willy, On 05/17/2014 04:39 AM, Willy Tarreau wrote: > On 05/15/2014 11:26 AM, Conrad Hoffmann wrote: >> If haproxy has been running for a certain amount of time (this part >> makes reproduction cumbersome), we quite often run into the following >> issue when restarting haproxy: if some of the workers have active >> connections that they must wait for to be terminated, a fraction of >> those would start hogging CPU like crazy. > > I had already observed that case myself and thought it was gone because > as you explain it's very hard to reproduce.
After sleeping a night over it, I came up with a sure-fire way to reproduce it. See below. >> Attaching a debugger showed >> that epoll_wait() would immediately return with events that they did not >> process because they had already closed the file descriptor (and thus >> owner == 0x0). > > Could you please confirm that it happens with the listening sockets ? Yes. I ran lsof on all processes before sending USR1, then attached a debugger the one gone haywire and checked the fd in the epoll event. It was one of the original listening sockets. Here is a nice way to trigger it: - run haproxy in deamon mode, but with nbproc 1 - open a HTTP keep-alive connection (and keep it open) - send USR1 to the only worker (it should stay running, because of the ongoing connection) - open a new connection -> the worker goes to 100% Works all the time for me. Running lsof before and after the signal neatly shows how the worker closed the listening socket but the parent still has it. >> However, based on the above assumption, I came up with the attached >> patch as a remedy. Basically it just implements the poller.clo(fd) >> handler for epoll, which is already implemented for other pollers. I >> can't say with certainty, but while we had bug hit rate of at least 80% >> before, I have not been able to trigger the bug in 5 attempts with the >> patch applied, so I currently think it actually works. > > I'm absolutely sure that your patch fixes the issue because it removes the > event before getting rid of the reference to the FD. However I have a problem > with this approach, it will add an extra syscall to *every* connection, which > is quite noticeable in a number of environments. Ideally we should extend the > poller's API to have a "close-shared" function or something like this which > would be used only for listening FDs that are shared between processes. > > Maybe we could simply have a flag in the fdtab indicating that the fd is > shared (basically all FDs opened before the fork would have it), and decide > in poller.clo() to only call epoll_ctl(DEL) when this flag is set, then clear > it. It would seam reasonable in my opinion. Do you think that you could extend > your current patch to do that ? Good point on the performance issue. I whipped up a new patch, I hope this is what you meant? I surely haven't yet fully understood all architecture and design decisions, so feel free to nitpick :) Also, I wasn't sure if this should go into the UNIX sockets, too? Now that I am writing this, I actually suppose so. Let me know if I should add it there, too! Enjoy your weekend, Conrad
diff --git a/include/types/fd.h b/include/types/fd.h index 1c2c7c8..12d5b6b 100644 --- a/include/types/fd.h +++ b/include/types/fd.h @@ -86,6 +86,7 @@ struct fdtab { unsigned char new:1; /* 1 if this fd has just been created */ unsigned char updated:1; /* 1 if this fd is already in the update list */ unsigned char linger_risk:1; /* 1 if we must kill lingering before closing */ + unsigned char listener:1; /* 1 if a listening socket, requires EPOLL_CTL_DEL on close */ }; /* less often used information */ diff --git a/src/ev_epoll.c b/src/ev_epoll.c index 2849ec6..46e5f30 100644 --- a/src/ev_epoll.c +++ b/src/ev_epoll.c @@ -46,6 +46,19 @@ static struct epoll_event ev; #endif /* + * Immediately remove file descriptor from epoll set upon close. + * Since we forked, some fds share inodes with the other process, and epoll may + * send us events even though this process closed the fd (see man 7 epoll, + * "Questions and answers", Q 6). + */ +REGPRM1 static void __fd_clo(int fd) +{ + if (unlikely(fdtab[fd].listener)) { + epoll_ctl(epoll_fd, EPOLL_CTL_DEL, fd, &ev); + } +} + +/* * Linux epoll() poller */ REGPRM2 static void _do_poll(struct poller *p, int exp) @@ -267,7 +280,7 @@ static void _do_register(void) p->pref = 300; p->private = NULL; - p->clo = NULL; + p->clo = __fd_clo; p->test = _do_test; p->init = _do_init; p->term = _do_term; diff --git a/src/proto_tcp.c b/src/proto_tcp.c index a672de4..8d4acaf 100644 --- a/src/proto_tcp.c +++ b/src/proto_tcp.c @@ -882,6 +882,7 @@ int tcp_bind_listener(struct listener *listener, char *errmsg, int errlen) fdtab[fd].owner = listener; /* reference the listener instead of a task */ fdtab[fd].iocb = listener->proto->accept; + fdtab[fd].listener = 1; /* epoll poller needs to know this */ fd_insert(fd); tcp_return: