Hi Bryan, here come two fixes that I have pushed to the Git tree. They will be in friday morning's snapshot, but I'm attaching the patches.
The bug is complex to reproduce, it requires a specific timing that I can only get with a combination of two machines and a certain number of concurrent connections. It manifests when an error is reported at the same time as a clean connection close. Only the connection close was handled, the error did not cause an abort of the connection. The issue is that afterwards, the error flag was lost, and the polling remained active, causing the loops you have noticed. These loops all eventually terminate thanks to the timeouts in the configuration. The two patches address different aspects of the issue, the first one being for a real bug and the second one more about a misdesign from me which fuels the bug. I'd like to really thank you for the amount of precise information you provided, that was really helpful, especially because I was suspecting totally unrelated issues (checks) and would not have found without your help. I'd be happy if you can test to confirm that the issue does not reappear anymore. Once I get your go (and possibly other pending fixes that might appear in between), I'll issue dev15 to avoid causing issues to other users. Thanks! Willy
>From debdc4b657056e517287f03f8439bb8e49be43ba Mon Sep 17 00:00:00 2001 From: Willy Tarreau <[email protected]> Date: Fri, 7 Dec 2012 00:01:33 +0100 Subject: BUG/MAJOR: raw_sock: must check error code on hangup In raw_sock, we already check for FD_POLL_HUP after a short recv() to avoid a useless syscall and detect the end of stream. However, we fail to check for FD_POLL_ERR here, which causes major issues as some errors might be delivered and ignored if they are delivered at the same time as a HUP, and there is no data to send to detect them on the other direction. Since the connections flags do not have the CO_FL_ERROR flag, the polling is not disabled on the socket and the pollers immediately call the conn_fd_handler() again, resulting in CPU spikes for as long as the timeouts allow them. Note that this patch alone fixes the issue but a few patches will follow to strengthen this fragile area. Big thanks to Bryan Berry who reported the issue with significant amounts of detailed traces that helped rule out many other initially suspected causes and to finally reproduce the issue in the lab. --- src/raw_sock.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/src/raw_sock.c b/src/raw_sock.c index 368b9f0..4c83a6f 100644 --- a/src/raw_sock.c +++ b/src/raw_sock.c @@ -282,6 +282,16 @@ static int raw_sock_to_buf(struct connection *conn, struct buffer *buf, int coun read0: conn_sock_read0(conn); conn->flags &= ~CO_FL_WAIT_L4_CONN; + + /* Now a final check for a possible asynchronous low-level error + * report. This can happen when a connection receives a reset + * after a shutdown, both POLL_HUP and POLL_ERR are queued, and + * we might have come from there by just checking POLL_HUP instead + * of recv()'s return value 0, so we have no way to tell there was + * an error without checking. + */ + if (unlikely(fdtab[conn->t.sock.fd].ev & FD_POLL_ERR)) + conn->flags |= CO_FL_ERROR; return done; } -- 1.7.12.2.21.g234cd45.dirty
>From 26d7cfce328c33e208437b30b4ca3fc2db359cd2 Mon Sep 17 00:00:00 2001 From: Willy Tarreau <[email protected]> Date: Fri, 7 Dec 2012 00:09:43 +0100 Subject: BUG/MAJOR: polling: do not set speculative events on ERR nor HUP Errors and Hangups are sticky events, which means that once they're detected, we never clear them, allowing them to be handled later if needed. Till now when an error was reported, it used to register a speculative I/O event for both recv and send. Since the connection had not requested such events, it was not able to detect a change and did not clear them, so the events were called in loops until a timeout caused their owner task to die. So this patch does two things : - stop registering spec events when no I/O activity was requested, so that we don't end up with non-disablable polling state ; - keep the sticky polling flags (ERR and HUP) when leaving the connection handler so that an error notification doesn't magically become a normal recv() or send() report once the event is converted to a spec event. It is normally not needed to make the connection handler emit an error when it detects POLL_ERR because either a registered data handler will have done it, or the event will be disabled by the wake() callback. --- src/connection.c | 2 +- src/ev_epoll.c | 4 ++-- src/ev_kqueue.c | 4 ++-- src/ev_poll.c | 4 ++-- src/ev_select.c | 4 ++-- 5 files changed, 9 insertions(+), 9 deletions(-) diff --git a/src/connection.c b/src/connection.c index e2d75ba..03a9bb0 100644 --- a/src/connection.c +++ b/src/connection.c @@ -153,7 +153,7 @@ int conn_fd_handler(int fd) conn->flags |= CO_FL_CONNECTED; /* remove the events before leaving */ - fdtab[fd].ev &= ~(FD_POLL_IN | FD_POLL_OUT | FD_POLL_HUP | FD_POLL_ERR); + fdtab[fd].ev &= FD_POLL_STICKY; /* commit polling changes */ conn_cond_update_polling(conn); diff --git a/src/ev_epoll.c b/src/ev_epoll.c index 870da10..2cd596a 100644 --- a/src/ev_epoll.c +++ b/src/ev_epoll.c @@ -161,10 +161,10 @@ REGPRM2 static void _do_poll(struct poller *p, int exp) * them so that if nothing can be done we don't need * to poll again. */ - if (fdtab[fd].ev & (FD_POLL_IN|FD_POLL_HUP|FD_POLL_ERR)) + if (fdtab[fd].ev & FD_POLL_IN) fd_ev_set(fd, DIR_RD); - if (fdtab[fd].ev & (FD_POLL_OUT|FD_POLL_ERR)) + if (fdtab[fd].ev & FD_POLL_OUT) fd_ev_set(fd, DIR_WR); fdtab[fd].iocb(fd); diff --git a/src/ev_kqueue.c b/src/ev_kqueue.c index 54c52c2..36a7b7f 100644 --- a/src/ev_kqueue.c +++ b/src/ev_kqueue.c @@ -152,10 +152,10 @@ REGPRM2 static void _do_poll(struct poller *p, int exp) * them so that if nothing can be done we don't need * to poll again. */ - if (fdtab[fd].ev & (FD_POLL_IN|FD_POLL_HUP|FD_POLL_ERR)) + if (fdtab[fd].ev & FD_POLL_IN) fd_ev_set(fd, DIR_RD); - if (fdtab[fd].ev & (FD_POLL_OUT|FD_POLL_ERR)) + if (fdtab[fd].ev & FD_POLL_OUT) fd_ev_set(fd, DIR_WR); fdtab[fd].iocb(fd); diff --git a/src/ev_poll.c b/src/ev_poll.c index 2ce4815..cdc357f 100644 --- a/src/ev_poll.c +++ b/src/ev_poll.c @@ -166,10 +166,10 @@ REGPRM2 static void _do_poll(struct poller *p, int exp) * them so that if nothing can be done we don't need * to poll again. */ - if (fdtab[fd].ev & (FD_POLL_IN|FD_POLL_HUP|FD_POLL_ERR)) + if (fdtab[fd].ev & FD_POLL_IN) fd_ev_set(fd, DIR_RD); - if (fdtab[fd].ev & (FD_POLL_OUT|FD_POLL_ERR)) + if (fdtab[fd].ev & FD_POLL_OUT) fd_ev_set(fd, DIR_WR); fdtab[fd].iocb(fd); diff --git a/src/ev_select.c b/src/ev_select.c index 0ac74e7..356eed7 100644 --- a/src/ev_select.c +++ b/src/ev_select.c @@ -161,10 +161,10 @@ REGPRM2 static void _do_poll(struct poller *p, int exp) * them so that if nothing can be done we don't need * to poll again. */ - if (fdtab[fd].ev & (FD_POLL_IN|FD_POLL_HUP|FD_POLL_ERR)) + if (fdtab[fd].ev & FD_POLL_IN) fd_ev_set(fd, DIR_RD); - if (fdtab[fd].ev & (FD_POLL_OUT|FD_POLL_ERR)) + if (fdtab[fd].ev & FD_POLL_OUT) fd_ev_set(fd, DIR_WR); fdtab[fd].iocb(fd); -- 1.7.12.2.21.g234cd45.dirty

