*There are no deadlocks*,
(gdb) info thread
* 5 Thread 0xf7771b70 (LWP 24962) 0x080509dd in transmit (fd=431, which=2,
arg=0xfef8ce48)
at memcached.c:4044
4 Thread 0xf6d70b70 (LWP 24963) 0x007ad430 in __kernel_vsyscall ()
3 Thread 0xf636fb70 (LWP 24964) 0x007ad430 in __kernel_vsyscall ()
2 Thread 0xf596eb70 (LWP 24965) 0x007ad430 in __kernel_vsyscall ()
1 Thread 0xf77b38d0 (LWP 24961) 0x007ad430 in __kernel_vsyscall ()
(gdb) t 1
[Switching to thread 1 (Thread 0xf77b38d0 (LWP 24961))]#0 0x007ad430 in
__kernel_vsyscall ()
(gdb) bt
#0 0x007ad430 in __kernel_vsyscall ()
#1 0x005c5366 in epoll_wait () from /lib/libc.so.6
#2 0x0074a750 in epoll_dispatch (base=0x9305008, arg=0x93053c0,
tv=0xff8e0cdc) at epoll.c:198
#3 0x0073d714 in event_base_loop (base=0x9305008, flags=0) at event.c:538
#4 0x08054467 in main (argc=19, argv=0xff8e2274) at memcached.c:5795
(gdb)
(gdb) t 2
[Switching to thread 2 (Thread 0xf596eb70 (LWP 24965))]#0 0x007ad430 in
__kernel_vsyscall ()
(gdb) bt
#0 0x007ad430 in __kernel_vsyscall ()
#1 0x00a652bc in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib/libpthread.so.0
#2 0x08055662 in slab_rebalance_thread (arg=0x0) at slabs.c:859
#3 0x00a61a49 in start_thread () from /lib/libpthread.so.0
#4 0x005c4aee in clone () from /lib/libc.so.6
(gdb) t 3
[Switching to thread 3 (Thread 0xf636fb70 (LWP 24964))]#0 0x007ad430 in
__kernel_vsyscall ()
(gdb) bt
#0 0x007ad430 in __kernel_vsyscall ()
#1 0x005838b6 in nanosleep () from /lib/libc.so.6
#2 0x005836e0 in sleep () from /lib/libc.so.6
#3 0x08056f6e in slab_maintenance_thread (arg=0x0) at slabs.c:819
#4 0x00a61a49 in start_thread () from /lib/libpthread.so.0
#5 0x005c4aee in clone () from /lib/libc.so.6
(gdb) t 4
[Switching to thread 4 (Thread 0xf6d70b70 (LWP 24963))]#0 0x007ad430 in
__kernel_vsyscall ()
(gdb) bt
#0 0x007ad430 in __kernel_vsyscall ()
#1 0x00a652bc in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib/libpthread.so.0
#2 0x080599f5 in assoc_maintenance_thread (arg=0x0) at assoc.c:251
#3 0x00a61a49 in start_thread () from /lib/libpthread.so.0
#4 0x005c4aee in clone () from /lib/libc.so.6
(gdb) t 5
[Switching to thread 5 (Thread 0xf7771b70 (LWP 24962))]#0 0x007ad430 in
__kernel_vsyscall ()
(gdb) bt
#0 0x007ad430 in __kernel_vsyscall ()
#1 0x00a68998 in sendmsg () from /lib/libpthread.so.0
#2 0x080509dd in transmit (fd=431, which=2, arg=0xfef8ce48) at
memcached.c:4044
#3 drive_machine (fd=431, which=2, arg=0xfef8ce48) at memcached.c:4370
#4 event_handler (fd=431, which=2, arg=0xfef8ce48) at memcached.c:4441
#5 0x0073d9e4 in event_process_active (base=0x9310658, flags=0) at
event.c:395
#6 event_base_loop (base=0x9310658, flags=0) at event.c:547
#7 0x08059fee in worker_libevent (arg=0x930c698) at thread.c:471
#8 0x00a61a49 in start_thread () from /lib/libpthread.so.0
#9 0x005c4aee in clone () from /lib/libc.so.6
(gdb)
*strace info, there is the only event named maxconnsevent on epoll?*
epoll_wait(4, {}, 32, 10) = 0
clock_gettime(CLOCK_MONOTONIC, {8374269, 10084037}) = 0
epoll_wait(4, {}, 32, 10) = 0
clock_gettime(CLOCK_MONOTONIC, {8374269, 20246365}) = 0
epoll_wait(4, {}, 32, 10) = 0
clock_gettime(CLOCK_MONOTONIC, {8374269, 30382098}) = 0
epoll_wait(4, {}, 32, 10) = 0
clock_gettime(CLOCK_MONOTONIC, {8374269, 40509766}) = 0
epoll_wait(4, {}, 32, 10) = 0
clock_gettime(CLOCK_MONOTONIC, {8374269, 50657403}) = 0
epoll_wait(4, {}, 32, 10) = 0
clock_gettime(CLOCK_MONOTONIC, {8374269, 60823841}) = 0
epoll_wait(4, {}, 32, 10) = 0
clock_gettime(CLOCK_MONOTONIC, {8374269, 71013006}) = 0
epoll_wait(4, {}, 32, 10) = 0
clock_gettime(CLOCK_MONOTONIC, {8374269, 81234264}) = 0
epoll_wait(4, {}, 32, 10) = 0
clock_gettime(CLOCK_MONOTONIC, {8374269, 91407508}) = 0
epoll_wait(4, {}, 32, 10) = 0
clock_gettime(CLOCK_MONOTONIC, {8374269, 101581187}) = 0
epoll_wait(4, {}, 32, 10) = 0
clock_gettime(CLOCK_MONOTONIC, {8374269, 111752457}) = 0
epoll_wait(4, {}, 32, 10) = 0
clock_gettime(CLOCK_MONOTONIC, {8374269, 121919049}) = 0
epoll_wait(4, {}, 32, 10) = 0
clock_gettime(CLOCK_MONOTONIC, {8374269, 132057597}) = 0
在 2014年10月29日星期三UTC+8下午2时47分23秒,Samdy Sun写道:
>
> Hello,
> I got a "memcached-1.4.20 stuck" problem when EMFILE happen.
> Here are my memcached's cmdline "memcached -s /xxx/mc_usock.11201 -c
> 1024 -m 4000 -f 1.05 -o slab_automove -o slab_reassign -t 1 -p 11201".
>
> cat /proc/version
> Linux version 2.6.32-358.el6.x86_64 (
> [email protected]) (gcc version 4.4.7 20120313
> (Red Hat 4.4.7-3) (GCC) ) #1 SMP Tue Jan 29 11:47:41 EST 2013
>
> memcached-1.4.20 stuck and don't work any more when it runs for a period
> of time.
>
> Here are some information for gdb:
> (gdb) p stats
> $2 = {mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __kind =
> 0, __nusers = 0, {__spins = 0,
> __list = {__next = 0x0}}}, __size = '\000' <repeats 23 times>,
> __align = 0}, curr_items = 149156,
> total_items = 9876811, curr_bytes = 3712501870,* curr_conns = 5*,
> total_conns = 39738, rejected_conns = 0,
> malloc_fails = 0, *reserved_fds = 5*, *conn_structs = 1012*, get_cmds =
> 0, set_cmds = 0, touch_cmds = 0,
> get_hits = 0, get_misses = 0, touch_hits = 0, touch_misses = 0,
> evictions = 0, reclaimed = 0,
> started = 0, *accepting_conns = false*, *listen_disabled_num = 1*,
> hash_power_level = 17,
> hash_bytes = 524288, hash_is_expanding = false, expired_unfetched = 0,
> evicted_unfetched = 0,
> slab_reassign_running = false, slabs_moved = 20, lru_crawler_running =
> false,
> disable_write_by_exptime = 0, disable_write_by_length = 0,
> disable_write_by_access = 0,
> evicted_write_reply_timeout_times = 0}
>
> (gdb) p allow_new_conns
> $4 = false
>
> And I found that "allow_new_conns" just set to false when "accept"
> failed and errno is "EMFILE".
> Here are the codes:
> static void drive_machine(conn *c) {
> ……
> } else if (errno == EMFILE) {
> if (settings.verbose > 0)
> fprintf(stderr, "Too many open connections\n");
> * accept_new_conns(false);*
> stop = true;
> } else {
> ……
> }
>
> If I change the flag "allow_new_conns", it can work again. As below:
> *(gdb) set allow_new_conns=1*
> (gdb) p allow_new_conns
> $5 = true
> (gdb) c
> Continuing.
>
> I know that "allow_new_conns" will be set to "true" when "conn_close"
> called. But how could it happen for the case that when "accept" failed ,
> and errno is "EMFILE", and this connection is the only one for accepting.
> Notes that *curr_conns = 5.*
> Not run out of fd:
> ls /proc/1748(memcached_pid)/fd | wc -l
> 17
>
>
--
---
You received this message because you are subscribed to the Google Groups
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.