> Hello,
>    i don't know exactly if it is memcached issue (or OS / libevent)
>    but time to time (~2 days)
>   one or two of 6 memcached servers stops respond \ accept new connections, 
> restart required :(
>   strace output is:
>  
> Process 20417 attached - interrupt to quit
> 00:21:15 epoll_wait(3, {}, 32, 1)       = 0 <0.001060>
> 00:21:15 epoll_wait(3, {}, 32, 10)      = 0 <0.010074>
> 00:21:15 epoll_wait(3, {}, 32, 10)      = 0 <0.010073>
> 00:21:15 epoll_wait(3, {}, 32, 10)      = 0 <0.010073>
> 00:21:15 epoll_wait(3, {}, 32, 10)      = 0 <0.010073>
> 00:21:15 epoll_wait(3, {}, 32, 10)      = 0 <0.010073>
> 00:21:15 epoll_wait(3, {}, 32, 10)      = 0 <0.010074>
> 00:21:15 epoll_wait(3, {}, 32, 10)      = 0 <0.010074>
> 00:21:15 epoll_wait(3, {}, 32, 10)      = 0 <0.010073>
> 00:21:15 epoll_wait(3, {}, 32, 10)      = 0 <0.010073>
> 00:21:15 epoll_wait(3, {}, 32, 10)      = 0 <0.010073>
> 00:21:15 epoll_wait(3, {}, 32, 10)      = 0 <0.010073>
> 00:21:15 epoll_wait(3, {}, 32, 10)      = 0 <0.010073>
> 00:21:15 epoll_wait(3, {}, 32, 10)      = 0 <0.010072>
> 00:21:15 epoll_wait(3, {}, 32, 10)      = 0 <0.010073>
> 00:21:15 epoll_wait(3, {}, 32, 10)      = 0 <0.010073>
> 00:21:15 epoll_wait(3, {}, 32, 10)      = 0 <0.010073>
> 00:21:15 epoll_wait(3, {}, 32, 10)      = 0 <0.010073>
> 00:21:15 epoll_wait(3, {}, 32, 10)      = 0 <0.010073>
> 00:21:15 epoll_wait(3, {}, 32, 10)      = 0 <0.010073>
> 00:21:15 epoll_wait(3, {}, 32, 10)      = 0 <0.010073>
> 00:21:15 epoll_wait(3, {}, 32, 10)      = 0 <0.010072>
> 00:21:15 epoll_wait(3, {}, 32, 10)      = 0 <0.010073>
> 00:21:15 epoll_wait(3, ^C <unfinished ...>
>  
> Process 20417 detached
> % time     seconds  usecs/call     calls    errors syscall
> ------ ----------- ----------- --------- --------- ----------------
>   -nan    0.000000           0       789           epoll_wait
> ------ ----------- ----------- --------- --------- ----------------
> 100.00    0.000000                   789           total
> pidof memcached
> 20417
>  
>  
> 00:21:55 up 2 days, 15:20,  1 user,  load average: 4.50, 4.75, 4.25
> top - 00:22:12 up 2 days, 15:20,  1 user,  load average: 4.57, 4.75, 4.26
> Tasks: 112 total,   1 running, 111 sleeping,   0 stopped,   0 zombie
> %Cpu(s): 62.6 us,  0.0 sy,  0.0 ni, 37.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 
> st
> KiB Mem:  32922088 total,  6595136 used, 26326952 free,   169480 buffers
> KiB Swap:  7811068 total,        0 used,  7811068 free,  4703516 cached
>  
>   PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND
> 20417 nobody    20   0  941m 434m 1036 S 799 1.4 817:03.77 memcached
>  
> MemcachePool::getstats(): Server localhost (tcp 11211, udp 0) failed with: 
> Connection timed out (110)
>  
> netstat -tpn|grep ':11211'|wc -l
> 44
>  
> memcached_1.4.15 libevent_2.0.19-stable or 2.0.21-stable
>  
> /usr/bin/memcached -m 30720 -u nobody -t 8
>   
>  
> 3.5.0-22-generic ubuntu-quantal
> Intel(R) Xeon(R) CPU E3-1230 V2 @ 3.30GHz
> 32gb ram,
> Manufacturer: Supermicro
> Product Name: X9SCL/X9SCM
>  
>    avg_load:
>   ~8K Cons/sec
>   ~40K Request/sec (set,inc,get)

That isn't a terrifically high load. Can you do:

`ls -l /proc/$(pidof memcached)/fd | wc -l` ? just to confirm you aren't
hitting maxconns in a weird way (as netstat doesn't show a ton of stuff
open).

Do you tend to run your other servers at or near maxconns?

Your box isn't out of TIME_WAIT buckets or ephemeral ports, or netfilter
maxed out? (check dmesg), or some other issue that would prevent it from
getting new connections until you restart something?

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to