Comment #7 on issue 370 by [email protected]: memcached(v1.4.15) dead
loop bug.
http://code.google.com/p/memcached/issues/detail?id=370
Thanks for dormando to pull me back to the do_item_alloc function, and i
also read again carefully about how to reproduce and fix the issue #260. It
is awful to find the extremely path to reproduce the bug. But i m afraid
there MAY have other problems, even at vervion .20, more higher
probability.
reproduce step by step in v 1.4.20:
1) hash expanding, switch the item_lock to global_lock;
2) In thread A, do_item_alloc try to get a new item, and try to get a ITEM
from the LRU tail. but it does not lock the ITEM, because it try to
hold the item_locks[];
3) after do_item_alloc check the refcount(after line items.c:139), other
thread B try to hold the ITEM(use global_lock), and increase the refcount,
and believe itself hold a refcount;
4) In thread A, do_item_alloc evicted the ITEM, and initialize the
ITEM(reset the refcount to 1), as a new one returning to the caller;
5) In thread B, it call item_remove to de-reference the refcount, and make
the refcount to 0, so the ITEM is free! (the thread A is holding the ITEM,
and the ITME is in the hash table yet);
6) So, any terrible thing may happen, including crash and dead loop.
In version lower than 1.4.19, it easier to occurr even if there is no hash
expanding. Because the code in do_item_alloc as following:
1) if the cur_hv == cur_hv, (it is possible at replace operation);
or
2) if the first loop hv!=cur_hv, and hold a lock; then the second loop
hv==cur_hv(without a lock, but the lock's pointer is not reset in the first
loop!), then release the lock wrong.
for (; tries > 0 && search != NULL; tries--, search=search->prev) {
uint32_t hv = hash(ITEM_key(search), search->nkey, 0);
/* Attempt to hash item lock the "search" item. If locked, no
* other callers can incr the refcount
*/
/* FIXME: I think we need to mask the hv here for comparison? */
if (hv != cur_hv && (hold_lock = item_trylock(hv)) == NULL)
------>
continue;
/* Now see if the item is refcount locked */
if (refcount_incr(&search->refcount) != 2) {
refcount_decr(&search->refcount);
/* Old rare bug could cause a refcount leak. We haven't seen
* it in years, but we leave this code in to prevent failures
* just in case */
if (search->time + TAIL_REPAIR_TIME < current_time) {
itemstats[id].tailrepairs++;
search->refcount = 1;
do_item_unlink_nolock(search, hv);
}
if (hold_lock)
item_trylock_unlock(hold_lock);
continue;
}
I think it's clear enough, but i also try to show more detail as following,
and try to reproduce the phenomenon in gdb:
I don't know why the function "assoc_maintenance_thread" write as
following, but other threads will get the global_lock
before "assoc_maintenance_thread" get it again.
static void *assoc_maintenance_thread(void *arg) {
while (do_run_maintenance_thread) {
int ii = 0;
/* Lock the cache, and bulk move multiple buckets to the new
* hash table. */
item_lock_global(); ----------------> step 5) try
to get the global_lock, it have to wait for other threads to release the
lock.
mutex_lock(&cache_lock);
for (ii = 0; ii < hash_bulk_move && expanding; ++ii) {
.....
}
mutex_unlock(&cache_lock);
item_unlock_global();
if (!expanding) {
/* finished expanding. tell all threads to use fine-grained
locks */
switch_item_lock_type(ITEM_LOCK_GRANULAR);
slabs_rebalancer_resume();
/* We are done expanding.. just wait for next invocation */
mutex_lock(&cache_lock);
started_expanding = false;
pthread_cond_wait(&maintenance_cond, &cache_lock);
------------>step 1) wait here for expanding notify.
/* Before doing anything, tell threads to use a global lock */
mutex_unlock(&cache_lock);
slabs_rebalancer_pause();
switch_item_lock_type(ITEM_LOCK_GLOBAL); ----------->step 2)
switch to the global_lock, without holding the global_lock.
---- other thread
will get the global_lock first. it is not thread-safe.
mutex_lock(&cache_lock);
assoc_expand(); ---->step 3) expand
the hash size, but the items are not moved to the new buckets.
mutex_unlock(&cache_lock); --->step 4) release
the lock, MAY be not thread-safe.
}
}
return NULL;
}
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
and in the function "do_item_alloc", using the function "item_trylock"
to get the item's lock. please look carefully at "item_trylock", it
directly access "item_locks", hoping a "no-op" as it said in the comment.
the item(search) is not lock at all.
So, after the checker "if (refcount_incr(&search->refcount) != 2) ",
other threads may hold the item and increase the refcounte, that will make
item *do_item_alloc(char *key, const size_t nkey, const int flags,
const rel_time_t exptime, const int nbytes,
const uint32_t cur_hv) {
uint8_t nsuffix;
item *it = NULL;
char suffix[40];
size_t ntotal = item_make_header(nkey + 1, flags, nbytes, suffix,
&nsuffix);
if (settings.use_cas) {
ntotal += sizeof(uint64_t);
}
unsigned int id = slabs_clsid(ntotal);
if (id == 0)
return 0;
mutex_lock(&cache_lock);
/* do a quick check if we have any expired items in the tail.. */
int tries = 5;
int tried_alloc = 0;
item *search;
void *hold_lock = NULL;
rel_time_t oldest_live = settings.oldest_live;
search = tails[id];
/* We walk up *only* for locked items. Never searching for expired.
* Waste of CPU for almost all deployments */
for (; tries > 0 && search != NULL; tries--, search=search->prev) {
if (search->nbytes == 0 && search->nkey == 0 && search->it_flags ==
1) {
/* We are a crawler, ignore it. */
tries++;
continue;
}
uint32_t hv = hash(ITEM_key(search), search->nkey);
/* Attempt to hash item lock the "search" item. If locked, no
* other callers can incr the refcount
*/
/* Don't accidentally grab ourselves, or bail if we can't quicklock
*/
if (hv == cur_hv || (hold_lock = item_trylock(hv)) == NULL)
---------------> 1) item_trylock always get lock from item_locks[], not
global_lock
continue;
/* Now see if the item is refcount locked */
if (refcount_incr(&search->refcount) != 2) {
---------------> 2) now the serch->refcount==2, means only the lru-link
reference the item.
refcount_decr(&search->refcount);
/* Old rare bug could cause a refcount leak. We haven't seen
* it in years, but we leave this code in to prevent failures
* just in case */
if (settings.tail_repair_time &&
search->time + settings.tail_repair_time <
current_time) {
itemstats[id].tailrepairs++;
search->refcount = 1;
do_item_unlink_nolock(search, hv);
}
if (hold_lock)
item_trylock_unlock(hold_lock);
continue;
}
/* Expired or flushed */
if ((search->exptime != 0 && search->exptime < current_time)
------------------> 3) after this line, other thread may got hold the item
and increase the refcount.
----------- ------- so if the item is alloc as a new item (refcount reset
to 1), then the other thread(
-------------------hold the item) call do_item_remove, it will free
the "new" item(because the
refcount
-------------------is 0). this is very highly possible at the item evicted
case.
|| (search->time <= oldest_live && oldest_live <=
current_time)) {
itemstats[id].reclaimed++;
if ((search->it_flags & ITEM_FETCHED) == 0) {
itemstats[id].expired_unfetched++;
}
it = search;
slabs_adjust_mem_requested(it->slabs_clsid, ITEM_ntotal(it),
ntotal);
do_item_unlink_nolock(it, hv);
/* Initialize the item block: */
it->slabs_clsid = 0;
} else if ((it = slabs_alloc(ntotal, id)) == NULL) {
tried_alloc = 1;
if (settings.evict_to_free == 0) {
itemstats[id].outofmemory++;
} else {
itemstats[id].evicted++;
itemstats[id].evicted_time = current_time - search->time;
if (search->exptime != 0)
itemstats[id].evicted_nonzero++;
if ((search->it_flags & ITEM_FETCHED) == 0) {
itemstats[id].evicted_unfetched++;
}
/* Special case. When ITEM_LOCK_GLOBAL mode is enabled, this should become a
* no-op, as it's only called from within the item lock if necessary.
* However, we can't mix a no-op and threads which are still synchronizing
to
* GLOBAL. So instead we just always try to lock. When in GLOBAL mode this
* turns into an effective no-op. Threads re-synchronize after the power
level
* switch so it should stay safe.
*/
void *item_trylock(uint32_t hv) {
pthread_mutex_t *lock = &item_locks[hv & hashmask(item_lock_hashpower)];
if (pthread_mutex_trylock(lock) == 0) {
return lock;
}
return NULL;
}
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
gdb reproduce(partial):
b assoc.c:211
b item_get
b items.c:158
(gdb) info thread
Id Target Id Frame
6 Thread 0x7ffff4d71700 (LWP 12647) "memcached" (running)
5 Thread 0x7ffff5572700 (LWP 12646) "memcached" (running)
4 Thread 0x7ffff5d73700 (LWP 12645) "memcached" (running)
* 3 Thread 0x7ffff6574700 (LWP 12644) "memcached" (running)
2 Thread 0x7ffff6d75700 (LWP 12643) "memcached" (running)
1 Thread 0x7ffff7fe0740 (LWP 12642) "memcached" 0x00007ffff76ad9a3 in
epoll_wait () at ../sysdeps/unix/syscall-template.S:81
////step 1////add a new item, modify the value of hash_items to 100000, to
making a hash expanding.
(gdb) thread 3
[Switching to thread 3 (Thread 0x7ffff6574700 (LWP 12644))]
#0 do_item_alloc (key=0x7ffff0026414 "e", nkey=1, flags=0, exptime=0,
nbytes=3, cur_hv=0) at items.c:159
159 tried_alloc = 1;
(gdb) c
Continuing.
Breakpoint 4, assoc_insert (it=0x7ffff7f35f70, hv=1187334374) at assoc.c:170
170 if (! expanding && hash_items > (hashsize(hashpower) * 3) / 2) {
(gdb) 508 it = do_item_get(key, nkey, hv);
p hash_items
$4 = 1000000
Breakpoint 1, assoc_maintenance_thread (arg=0x0) at assoc.c:211
211 item_lock_global();
(gdb) info thread
Id Target Id Frame
6 Thread 0x7ffff4d71700 (LWP 12647) "memcached"
assoc_maintenance_thread (arg=0x0) at assoc.c:211
5 Thread 0x7ffff5572700 (LWP 12646) "memcached" (running)
4 Thread 0x7ffff5d73700 (LWP 12645) "memcached" (running)
* 3 Thread 0x7ffff6574700 (LWP 12644) "memcached" (running)
2 Thread 0x7ffff6d75700 (LWP 12643) "memcached" (running)
1 Thread 0x7ffff7fe0740 (LWP 12642) "memcached" (running)
/////delete all key in the cache.
/////step 2/// add a item, key="71912_yhd.serial.product.get_1.0_0", to the
slot 2, as the first item now.
////step 3///get the key="71912_yhd.serial.product.get_1.0_0",
///step 4 //and try to add a new key="Y_ORDER_225426358_02" to the slot 2.
Breakpoint 5, do_item_alloc
(key=0x7ffff0026414 "71912_yhd.serial.product.get_1.0_0", nkey=34, flags=0,
exptime=0, nbytes=3, cur_hv=0) at items.c:92
92 const uint32_t cur_hv) {
(gdb) n
....
Breakpoint 3, item_get
(key=0x7fffe0026414 "71912_yhd.serial.product.get_1.0_0", nkey=34) at
thread.c:506
506 hv = hash(key, nkey);
(gdb) info thread
Id Target Id Frame
6 Thread 0x7ffff4d71700 (LWP 12647) "memcached"
assoc_maintenance_thread (arg=0x0) at assoc.c:211
5 Thread 0x7ffff5572700 (LWP 12646) "memcached" (running)
4 Thread 0x7ffff5d73700 (LWP 12645) "memcached" (running)
3 Thread 0x7ffff6574700 (LWP 12644) "memcached" do_item_alloc
(key=0x7ffff0026414 "Y_ORDER_225426358_02", nkey=20, flags=0, exptime=0,
nbytes=22, cur_hv=0) at items.c:147
* 2 Thread 0x7ffff6d75700 (LWP 12643) "memcached" item_get
(key=0x7fffe0026414 "71912_yhd.serial.product.get_1.0_0", nkey=34) at
thread.c:506
1 Thread 0x7ffff7fe0740 (LWP 12642) "memcached" (running)
(gdb) n
507 item_lock(hv);
(gdb)
508 it = do_item_get(key, nkey, hv);
(gdb)
509 item_unlock(hv);
(gdb) thread 3
[Switching to thread 3 (Thread 0x7ffff6574700 (LWP 12644))]
#0 do_item_alloc (key=0x7ffff0026414 "Y_ORDER_225426358_02", nkey=20,
flags=0, exptime=0, nbytes=22, cur_hv=0) at items.c:147
147 if ((search->exptime != 0 && search->exptime < current_time)
(gdb) p search->refcount
$21 = 3 -------> now the refcount is
wrong. (may be in the evicted process is better).
(gdb) bt
#0 do_item_alloc (key=0x7ffff0026414 "Y_ORDER_225426358_02", nkey=20,
flags=0, exptime=0, nbytes=22, cur_hv=0) at items.c:147
#1 0x0000000000417c26 in item_alloc
(key=0x7ffff0026414 "Y_ORDER_225426358_02", nkey=20, flags=0, exptime=0,
nbytes=22) at thread.c:495
#2 0x000000000040ace9 in process_update_command (c=0x7ffff0026200,
tokens=0x7ffff6573be0, ntokens=6, comm=2, handle_cas=false) at
memcached.c:3084
#3 0x000000000040bde2 in process_command (c=0x7ffff0026200,
command=0x7ffff0026410 "set") at memcached.c:3437
#4 0x000000000040cc47 in try_read_command (c=0x7ffff0026200) at
memcached.c:3763
#5 0x000000000040d958 in drive_machine (c=0x7ffff0026200) at
memcached.c:4108
#6 0x000000000040e4f7 in event_handler (fd=37, which=2,
arg=0x7ffff0026200) at memcached.c:4353
#7 0x00007ffff7ba3f24 in event_base_loop () from
/usr/lib/x86_64-linux-gnu/libevent-2.0.so.5
#8 0x0000000000417926 in worker_libevent (arg=0x635ea0) at thread.c:386
#9 0x00007ffff7980182 in start_thread (arg=0x7ffff6574700) at
pthread_create.c:312
#10 0x00007ffff76ad30d in clone ()
at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb)
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Connected to 127.0.0.1.
Escape character is '^]'.
set a 0 0 1
1
STORED
^[[A^[[B
ERROR
set b 0 0 1
2
STORED
get 71912_yhd.serial.product.get_1.0_0
jason@gy:~$ telnet 127.0.0.1 11211
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
set c 0 0 1
3
STORED
set 71912_yhd.serial.product.get_1.0_0 0 0 1
3
STORED
delete 71912_yhd.serial.product.get_1.0_0
DELETED
set 71912_yhd.serial.product.get_1.0_0 0 0 1
3
STORED
delete Y_ORDER_225426358_02
delete a
delete b
delete c
NOT_FOUND
DELETED
NOT_FOUND
NOT_FOUND
set Y_ORDER_225426358_02 0 0 20
aaaaaaaaaaaaaaaaaaaa
//////////////////////////////////////////
I will try to write a test script to reproduce it, and try to commit a
patch to fix it.
--
You received this message because this project is configured to send all
issue notifications to this address.
You may adjust your notification preferences at:
https://code.google.com/hosting/settings
--
---
You received this message because you are subscribed to the Google Groups "memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.