Comment #5 on issue 106 by [email protected]: binary protocol parsing can
cause memcached server lockup
http://code.google.com/p/memcached/issues/detail?id=106
Hello again! I (the author of http://gitorious.org/snitchaser) has
disappeared for nearly 10 month, and now come back to continue working on
this bug. Snitchaser now has been split into 2 projects: Snitchaser
(http://code.google.com/p/snitchaser/) for single threading programs and
ReBranch (http://code.google.com/p/rebranch/) for multi-threading programs.
I use ReBranch to analyze this bug, and solve it in 2 hours. In fact the
root cause is simple, I should have discover it 5 month ago.
The root cause is that, when memcached set a udp connection to conn_close
state, the connection will never come back again. Different from tcp
connections, a memcached server can have only 1 udp connection to serve all
udp requests. Hence, for an error connection, it only 'cleanup' the
connection, not free it. However, memcached never revert the state of a
cleanup-ed connection.
Here we suggest a patch for memcached 1.3.5:
$ diff -u ./memcached.ori.c ./memcached.c
--- ./memcached.ori.c 2011-06-27 22:16:22.401000079 +0800
+++ ./memcached.c 2011-06-27 22:17:26.102000078 +0800
@@ -471,6 +471,11 @@
sasl_dispose(&c->sasl_conn);
c->sasl_conn = NULL;
}
+
+ if (IS_UDP(c->transport)) {
+ recvfrom(c->sfd, NULL, 0, 0, NULL, NULL);
+ conn_set_state(c, conn_new_cmd);
+ }
}
/*
@@ -3223,7 +3228,7 @@
res -= 8;
memmove(c->rbuf, c->rbuf + 8, res);
- c->rbytes += res;
+ c->rbytes = res;
c->rcurr = c->rbuf;
return READ_DATA_RECEIVED;
}
For udp connection, when closing, it consumes packets in socket then reset
the state to conn_new_cmd.