Re: Seeing high CPU usage with obscene number of calls to epoll_wait

Willy Tarreau Thu, 06 Dec 2012 15:29:16 -0800

Hi Bryan,

here come two fixes that I have pushed to the Git tree. They will be
in friday morning's snapshot, but I'm attaching the patches.


The bug is complex to reproduce, it requires a specific timing that I
can only get with a combination of two machines and a certain number
of concurrent connections. It manifests when an error is reported at
the same time as a clean connection close. Only the connection close
was handled, the error did not cause an abort of the connection. The
issue is that afterwards, the error flag was lost, and the polling
remained active, causing the loops you have noticed. These loops all
eventually terminate thanks to the timeouts in the configuration.

The two patches address different aspects of the issue, the first one
being for a real bug and the second one more about a misdesign from me
which fuels the bug.

I'd like to really thank you for the amount of precise information you
provided, that was really helpful, especially because I was suspecting
totally unrelated issues (checks) and would not have found without your
help.

I'd be happy if you can test to confirm that the issue does not reappear
anymore. Once I get your go (and possibly other pending fixes that might
appear in between), I'll issue dev15 to avoid causing issues to other
users.

Thanks!
Willy

>From debdc4b657056e517287f03f8439bb8e49be43ba Mon Sep 17 00:00:00 2001
From: Willy Tarreau <[email protected]>
Date: Fri, 7 Dec 2012 00:01:33 +0100
Subject: BUG/MAJOR: raw_sock: must check error code on hangup

In raw_sock, we already check for FD_POLL_HUP after a short recv()
to avoid a useless syscall and detect the end of stream. However,
we fail to check for FD_POLL_ERR here, which causes major issues
as some errors might be delivered and ignored if they are delivered
at the same time as a HUP, and there is no data to send to detect
them on the other direction.

Since the connections flags do not have the CO_FL_ERROR flag, the
polling is not disabled on the socket and the pollers immediately
call the conn_fd_handler() again, resulting in CPU spikes for as
long as the timeouts allow them.

Note that this patch alone fixes the issue but a few patches will
follow to strengthen this fragile area.

Big thanks to Bryan Berry who reported the issue with significant
amounts of detailed traces that helped rule out many other initially
suspected causes and to finally reproduce the issue in the lab.
---
 src/raw_sock.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/src/raw_sock.c b/src/raw_sock.c
index 368b9f0..4c83a6f 100644
--- a/src/raw_sock.c
+++ b/src/raw_sock.c
@@ -282,6 +282,16 @@ static int raw_sock_to_buf(struct connection *conn, struct 
buffer *buf, int coun
  read0:
        conn_sock_read0(conn);
        conn->flags &= ~CO_FL_WAIT_L4_CONN;
+
+       /* Now a final check for a possible asynchronous low-level error
+        * report. This can happen when a connection receives a reset
+        * after a shutdown, both POLL_HUP and POLL_ERR are queued, and
+        * we might have come from there by just checking POLL_HUP instead
+        * of recv()'s return value 0, so we have no way to tell there was
+        * an error without checking.
+        */
+       if (unlikely(fdtab[conn->t.sock.fd].ev & FD_POLL_ERR))
+               conn->flags |= CO_FL_ERROR;
        return done;
 }

-- 
1.7.12.2.21.g234cd45.dirty

>From 26d7cfce328c33e208437b30b4ca3fc2db359cd2 Mon Sep 17 00:00:00 2001
From: Willy Tarreau <[email protected]>
Date: Fri, 7 Dec 2012 00:09:43 +0100
Subject: BUG/MAJOR: polling: do not set speculative events on ERR nor HUP

Errors and Hangups are sticky events, which means that once they're
detected, we never clear them, allowing them to be handled later if
needed.

Till now when an error was reported, it used to register a speculative
I/O event for both recv and send. Since the connection had not requested
such events, it was not able to detect a change and did not clear them,
so the events were called in loops until a timeout caused their owner
task to die.

So this patch does two things :
  - stop registering spec events when no I/O activity was requested,
    so that we don't end up with non-disablable polling state ;

  - keep the sticky polling flags (ERR and HUP) when leaving the
    connection handler so that an error notification doesn't
    magically become a normal recv() or send() report once the
    event is converted to a spec event.

It is normally not needed to make the connection handler emit an
error when it detects POLL_ERR because either a registered data
handler will have done it, or the event will be disabled by the
wake() callback.
---
 src/connection.c | 2 +-
 src/ev_epoll.c   | 4 ++--
 src/ev_kqueue.c  | 4 ++--
 src/ev_poll.c    | 4 ++--
 src/ev_select.c  | 4 ++--
 5 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/src/connection.c b/src/connection.c
index e2d75ba..03a9bb0 100644
--- a/src/connection.c
+++ b/src/connection.c
@@ -153,7 +153,7 @@ int conn_fd_handler(int fd)
                conn->flags |= CO_FL_CONNECTED;
 
        /* remove the events before leaving */
-       fdtab[fd].ev &= ~(FD_POLL_IN | FD_POLL_OUT | FD_POLL_HUP | FD_POLL_ERR);
+       fdtab[fd].ev &= FD_POLL_STICKY;
 
        /* commit polling changes */
        conn_cond_update_polling(conn);
diff --git a/src/ev_epoll.c b/src/ev_epoll.c
index 870da10..2cd596a 100644
--- a/src/ev_epoll.c
+++ b/src/ev_epoll.c
@@ -161,10 +161,10 @@ REGPRM2 static void _do_poll(struct poller *p, int exp)
                         * them so that if nothing can be done we don't need
                         * to poll again.
                         */
-                       if (fdtab[fd].ev & (FD_POLL_IN|FD_POLL_HUP|FD_POLL_ERR))
+                       if (fdtab[fd].ev & FD_POLL_IN)
                                fd_ev_set(fd, DIR_RD);
 
-                       if (fdtab[fd].ev & (FD_POLL_OUT|FD_POLL_ERR))
+                       if (fdtab[fd].ev & FD_POLL_OUT)
                                fd_ev_set(fd, DIR_WR);
 
                        fdtab[fd].iocb(fd);
diff --git a/src/ev_kqueue.c b/src/ev_kqueue.c
index 54c52c2..36a7b7f 100644
--- a/src/ev_kqueue.c
+++ b/src/ev_kqueue.c
@@ -152,10 +152,10 @@ REGPRM2 static void _do_poll(struct poller *p, int exp)
                         * them so that if nothing can be done we don't need
                         * to poll again.
                         */
-                       if (fdtab[fd].ev & (FD_POLL_IN|FD_POLL_HUP|FD_POLL_ERR))
+                       if (fdtab[fd].ev & FD_POLL_IN)
                                fd_ev_set(fd, DIR_RD);
 
-                       if (fdtab[fd].ev & (FD_POLL_OUT|FD_POLL_ERR))
+                       if (fdtab[fd].ev & FD_POLL_OUT)
                                fd_ev_set(fd, DIR_WR);
 
                        fdtab[fd].iocb(fd);
diff --git a/src/ev_poll.c b/src/ev_poll.c
index 2ce4815..cdc357f 100644
--- a/src/ev_poll.c
+++ b/src/ev_poll.c
@@ -166,10 +166,10 @@ REGPRM2 static void _do_poll(struct poller *p, int exp)
                         * them so that if nothing can be done we don't need
                         * to poll again.
                         */
-                       if (fdtab[fd].ev & (FD_POLL_IN|FD_POLL_HUP|FD_POLL_ERR))
+                       if (fdtab[fd].ev & FD_POLL_IN)
                                fd_ev_set(fd, DIR_RD);
 
-                       if (fdtab[fd].ev & (FD_POLL_OUT|FD_POLL_ERR))
+                       if (fdtab[fd].ev & FD_POLL_OUT)
                                fd_ev_set(fd, DIR_WR);
 
                        fdtab[fd].iocb(fd);
diff --git a/src/ev_select.c b/src/ev_select.c
index 0ac74e7..356eed7 100644
--- a/src/ev_select.c
+++ b/src/ev_select.c
@@ -161,10 +161,10 @@ REGPRM2 static void _do_poll(struct poller *p, int exp)
                                 * them so that if nothing can be done we don't 
need
                                 * to poll again.
                                 */
-                               if (fdtab[fd].ev & 
(FD_POLL_IN|FD_POLL_HUP|FD_POLL_ERR))
+                               if (fdtab[fd].ev & FD_POLL_IN)
                                        fd_ev_set(fd, DIR_RD);
 
-                               if (fdtab[fd].ev & (FD_POLL_OUT|FD_POLL_ERR))
+                               if (fdtab[fd].ev & FD_POLL_OUT)
                                        fd_ev_set(fd, DIR_WR);
 
                                fdtab[fd].iocb(fd);
-- 
1.7.12.2.21.g234cd45.dirty

Re: Seeing high CPU usage with obscene number of calls to epoll_wait

Reply via email to