> Hello,
> i don't know exactly if it is memcached issue (or OS / libevent)
> but time to time (~2 days)
> one or two of 6 memcached servers stops respond \ accept new connections,
> restart required :(
> strace output is:
>
> Process 20417 attached - interrupt to quit
> 00:21:15 epoll_wait(3, {}, 32, 1) = 0 <0.001060>
> 00:21:15 epoll_wait(3, {}, 32, 10) = 0 <0.010074>
> 00:21:15 epoll_wait(3, {}, 32, 10) = 0 <0.010073>
> 00:21:15 epoll_wait(3, {}, 32, 10) = 0 <0.010073>
> 00:21:15 epoll_wait(3, {}, 32, 10) = 0 <0.010073>
> 00:21:15 epoll_wait(3, {}, 32, 10) = 0 <0.010073>
> 00:21:15 epoll_wait(3, {}, 32, 10) = 0 <0.010074>
> 00:21:15 epoll_wait(3, {}, 32, 10) = 0 <0.010074>
> 00:21:15 epoll_wait(3, {}, 32, 10) = 0 <0.010073>
> 00:21:15 epoll_wait(3, {}, 32, 10) = 0 <0.010073>
> 00:21:15 epoll_wait(3, {}, 32, 10) = 0 <0.010073>
> 00:21:15 epoll_wait(3, {}, 32, 10) = 0 <0.010073>
> 00:21:15 epoll_wait(3, {}, 32, 10) = 0 <0.010073>
> 00:21:15 epoll_wait(3, {}, 32, 10) = 0 <0.010072>
> 00:21:15 epoll_wait(3, {}, 32, 10) = 0 <0.010073>
> 00:21:15 epoll_wait(3, {}, 32, 10) = 0 <0.010073>
> 00:21:15 epoll_wait(3, {}, 32, 10) = 0 <0.010073>
> 00:21:15 epoll_wait(3, {}, 32, 10) = 0 <0.010073>
> 00:21:15 epoll_wait(3, {}, 32, 10) = 0 <0.010073>
> 00:21:15 epoll_wait(3, {}, 32, 10) = 0 <0.010073>
> 00:21:15 epoll_wait(3, {}, 32, 10) = 0 <0.010073>
> 00:21:15 epoll_wait(3, {}, 32, 10) = 0 <0.010072>
> 00:21:15 epoll_wait(3, {}, 32, 10) = 0 <0.010073>
> 00:21:15 epoll_wait(3, ^C <unfinished ...>
>
> Process 20417 detached
> % time seconds usecs/call calls errors syscall
> ------ ----------- ----------- --------- --------- ----------------
> -nan 0.000000 0 789 epoll_wait
> ------ ----------- ----------- --------- --------- ----------------
> 100.00 0.000000 789 total
> pidof memcached
> 20417
>
>
> 00:21:55 up 2 days, 15:20, 1 user, load average: 4.50, 4.75, 4.25
> top - 00:22:12 up 2 days, 15:20, 1 user, load average: 4.57, 4.75, 4.26
> Tasks: 112 total, 1 running, 111 sleeping, 0 stopped, 0 zombie
> %Cpu(s): 62.6 us, 0.0 sy, 0.0 ni, 37.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0
> st
> KiB Mem: 32922088 total, 6595136 used, 26326952 free, 169480 buffers
> KiB Swap: 7811068 total, 0 used, 7811068 free, 4703516 cached
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 20417 nobody 20 0 941m 434m 1036 S 799 1.4 817:03.77 memcached
>
> MemcachePool::getstats(): Server localhost (tcp 11211, udp 0) failed with:
> Connection timed out (110)
>
> netstat -tpn|grep ':11211'|wc -l
> 44
>
> memcached_1.4.15 libevent_2.0.19-stable or 2.0.21-stable
>
> /usr/bin/memcached -m 30720 -u nobody -t 8
>
>
> 3.5.0-22-generic ubuntu-quantal
> Intel(R) Xeon(R) CPU E3-1230 V2 @ 3.30GHz
> 32gb ram,
> Manufacturer: Supermicro
> Product Name: X9SCL/X9SCM
>
> avg_load:
> ~8K Cons/sec
> ~40K Request/sec (set,inc,get)
That isn't a terrifically high load. Can you do:
`ls -l /proc/$(pidof memcached)/fd | wc -l` ? just to confirm you aren't
hitting maxconns in a weird way (as netstat doesn't show a ton of stuff
open).
Do you tend to run your other servers at or near maxconns?
Your box isn't out of TIME_WAIT buckets or ephemeral ports, or netfilter
maxed out? (check dmesg), or some other issue that would prevent it from
getting new connections until you restart something?
--
---
You received this message because you are subscribed to the Google Groups
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.