Comment #5 on issue 106 by [email protected]: binary protocol parsing can cause memcached server lockup
http://code.google.com/p/memcached/issues/detail?id=106

Hello again! I (the author of http://gitorious.org/snitchaser) has disappeared for nearly 10 month, and now come back to continue working on this bug. Snitchaser now has been split into 2 projects: Snitchaser (http://code.google.com/p/snitchaser/) for single threading programs and ReBranch (http://code.google.com/p/rebranch/) for multi-threading programs. I use ReBranch to analyze this bug, and solve it in 2 hours. In fact the root cause is simple, I should have discover it 5 month ago.

The root cause is that, when memcached set a udp connection to conn_close state, the connection will never come back again. Different from tcp connections, a memcached server can have only 1 udp connection to serve all udp requests. Hence, for an error connection, it only 'cleanup' the connection, not free it. However, memcached never revert the state of a cleanup-ed connection.

Here we suggest a patch for memcached 1.3.5:

$ diff -u ./memcached.ori.c ./memcached.c
--- ./memcached.ori.c   2011-06-27 22:16:22.401000079 +0800
+++ ./memcached.c       2011-06-27 22:17:26.102000078 +0800
@@ -471,6 +471,11 @@
         sasl_dispose(&c->sasl_conn);
         c->sasl_conn = NULL;
     }
+
+    if (IS_UDP(c->transport)) {
+        recvfrom(c->sfd, NULL, 0, 0, NULL, NULL);
+        conn_set_state(c, conn_new_cmd);
+    }
 }

 /*
@@ -3223,7 +3228,7 @@
         res -= 8;
         memmove(c->rbuf, c->rbuf + 8, res);

-        c->rbytes += res;
+        c->rbytes = res;
         c->rcurr = c->rbuf;
         return READ_DATA_RECEIVED;
     }

For udp connection, when closing, it consumes packets in socket then reset the state to conn_new_cmd.


Reply via email to