Seems like you're right.. I'd re-arranged where the LRU lock (cache_lock)
is called then forgot to update that one bit. Most of the do_item_unlink
code is safe there, until it gets into the LRU bits. It's unlikely anyone
actually saw a crash from this as it's a narrow race though.

That's easy to fix. It's still necessary to delete it, since threads can
stack around a handful of objects and cause rebalance to hang.

Thanks!

On Thu, 3 Jul 2014, Zhiwei Chan wrote:

>  the item lock can only protect the hash list, but what about the LRU list? 
> As far as i know, if trying to delete a node from a doubly-linked-list,
> it is necessary to lock at least 3 node: node, node->pre, node->next. I will 
> try to check if it may crash the LRU list in gdb next week .
>   And I think in do_item_get it is not necessary to delete the item that is 
> re-balanced, just leave it there and return NULL seems better.
>
> 在 2014年7月3日星期四UTC+8下午1时30分29秒,Dormando写道:
>       the item lock is already held for that key when do_item_get is called,
>       which is why the nolock code is called there.
>
>       slab rebalance has that second short-circuiting of fetches to ensure 
> very
>       hot items don't permanently jam a page move.
>
>       On Wed, 2 Jul 2014, Zhiwei Chan wrote:
>
>       > Hi all,   I have thought carefully about the the thread-safe 
> memcached recently, and found that if the re-balance is running, it may
>       not
>       > thread-safety. The code "do_item_get->do_item_unlink_nolock" may 
> corrupt the hash table. Whenever it trying to modify the hash table,
>       it should get
>       > cache_lock, but the function do_item_get have not got the cache_lock.
>       >    Please tell me if anything i neglected.
>       >
>       > /** wrapper around assoc_find which does the lazy expiration logic */
>       > item *do_item_get(const char *key, const size_t nkey, const uint32_t 
> hv) {
>       >     //mutex_lock(&cache_lock);
>       >     item *it = assoc_find(key, nkey, hv);
>       >     if (it != NULL) {
>       >         refcount_incr(&it->refcount);
>       >         /* Optimization for slab reassignment. prevents popular items 
> from
>       >          * jamming in busy wait. Can only do this here to satisfy 
> lock order
>       >          * of item_lock, cache_lock, slabs_lock. */
>       >         if (slab_rebalance_signal &&
>       >             ((void *)it >= slab_rebal.slab_start && (void *)it < 
> slab_rebal.slab_end)) {
>       >             do_item_unlink_nolock(it, hv);   
> -------------------------------------------------------------------> no lock 
> before
>       unlink.
>       >             do_item_remove(it);
>       >             it = NULL;
>       >         }
>       >     }
>       >
>       > --
>       >
>       > ---
>       > You received this message because you are subscribed to the Google 
> Groups "memcached" group.
>       > To unsubscribe from this group and stop receiving emails from it, 
> send an email to [email protected].
>       > For more options, visit https://groups.google.com/d/optout.
>       >
>       >
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to