On Wed, Mar 14, 2012 at 11:02 AM, Wendy Cheng <[email protected]> wrote:
> This mutex seems to only exist with Storage Engine branch.
>
> We're working on prototyping a new storage engine on top of an NVRAM
> device. This morning, the test server hangs. Four worker threads are
> blocked waiting for cache_lock. The cache_lock holder is blocked in
> "notify_io_complete()" waiting for the thread mutex (thru LOCK_THREAD
> macro) - it is our add-on completion handler (thread) that does not
> get dispatched via the existing memcached's event_handler(). Examining
> the source code by eyes, I don't seem to be able to find the place
> where the LOCK_THREAD could be invoked elsewhere.
>
> I'm still checking the code. At the same time, not sure whether folks
> can pass the following info to speed up this debugging:
>
> 1) what is a "tap_thread" ?
> 2) what structures are protected by this mutex ?
>

ok, found the place where it deadlocked. Our completion thread tried
to complete the io (by notify_io_complete()) for the memcached worker
thread while holding the cache_lock. The worker thread locked itself
(thread->mutex) before entering "process_command()"; then tried to
obtain the cache_lock. This implies I can't call notify_io_complete()
while holding cache_lock. Guess I need to spend times to get myself
familiar with network protocol logic, instead of isolating within
storage engine itself. .

It would be nice to know what is a tap thread but I'm ok for now.

Thanks,
Wendy

Reply via email to