On Wednesday, March 14, 2012 1:35:38 PM UTC-7, Wendy Cheng wrote: ok, found the place where it deadlocked. Our completion thread tried > to complete the io (by notify_io_complete()) for the memcached worker > thread while holding the cache_lock. The worker thread locked itself > (thread->mutex) before entering "process_command()"; then tried to > obtain the cache_lock. This implies I can't call notify_io_complete() > while holding cache_lock. Guess I need to spend times to get myself > familiar with network protocol logic, instead of isolating within > storage engine itself. . > The idea for notify_io_complete makes it a bit hard to deadlock. Typically, a request comes in and you service it immediately. if you can't immediately service the request, you can ask another thread to perform the work (e.g. an existing service pool or a temporary thread of it's an infrequent thing) and then you return EWOULDBLOCK. This causes memcached to remove the connection from its libevent set completely (that is, you are now completely responsible for it). After this, that thread may call notify_io_complete() to have the connection added back into libevent have memcached reissue the request against your engine.
> It would be nice to know what is a tap thread but I'm ok for now. > The tap thread owns all the tap connections. It's different from the normal protocol worker threads. There's a writeup on tap here: http://code.google.com/p/memcached/wiki/Tap
