I've only got 2 servers hitting this. Currently, the connection limit is set to 1024, but I can increase that.
I'm running now. Looking at memcached stats, the value of listen_disabled_num is 0. My pecl/memcache library is 2.2.5, latest stable. VLadmir, I do have a cacti installation. Looking at that, I see a cpu peak at that time, but that may just be a result of having 40 apache threads actively churning? On Tue, Sep 22, 2009 at 5:53 PM, dormando <[email protected]> wrote: > Okay, > > Smells like you're leaking memcached connections objects somewhere, or you > have a ton of servers? During these spikes, can you telnet to memcached > and run the 'stats' command, or can you not connect either? > > Try restarting memcached with -c (connection limit) set to 32767 or > somesuch. See if that changes things. > > Is your pecl/memcache library fully upgraded? > > If you're using memcached 1.2.8 or later the 'stats' output has a value > 'listen_disabled_num' - if that value is nonzero, or incrementing, you're > hitting the connection limit on memcached. > > On Tue, 22 Sep 2009, nsheth wrote: > > > > > I've already looked in some detail at that, but haven't been able to > > discern any real pattern. I'll look again, though. > > > > I suspect memcache, as whenever I experience this, I get a flood of > > messages in my error log like: > > > > [Sun Sep 13 14:54:34 2009] [error] [client 10.0.0.2] PHP Warning: > > memcache_pconnect() [<a href='function.memcache-pconnect'>function. > > memcache-pconnect</a>]: Can't connect to 10.0.0.5:11211, Unknown error > > (0) in /var/www/html/memcache.php on line 174, referer: xxxx > > > > On Sep 22, 5:31 pm, dormando <[email protected]> wrote: > > > Hey, > > > > > > Can you troubleshoot it more carefully without thinking it's specific > to > > > memcached? How'd you track it down to memcached in the first place? > > > > > > When your load is spiking, what requests are hitting your server? Can > you > > > look at an apache server-status page to see what's in flight, or > > > re-assemble such a view from the logs? > > > > > > It smells like you're getting a short flood of traffic. If you can see > > > what type of traffic you're getting at the time of the load spike you > can > > > reproduce it yourself... Load the page yourself, time how long it takes > to > > > render, then break it down and see what it's doing. > > > > > > If it's related to memcached, it's still likely to be a bug in how > you're > > > using it internally (looping wrong, or something) - since your load is > > > related to the number of apache procs, and you claim it's not swapping, > > > it's either doing disk io or running CPU hard. > > > > > > -Dormando > > > > > > On Tue, 22 Sep 2009, nsheth wrote: > > > > > > > Hmm, just saw the same issue occur again. Load spiked to 35-40. > > > > (I've set MaxClients to 40 in apache, and looking at the status page, > > > > I see it basically using every thread, so that may explain that load > > > > level). > > > > > > > Going back on the connections, it looks like we've got about 1.2k > > > > connections in various states, so nowhere near any of these limits. > > > > > > > Any other thoughts? > > > > > > > Thanks! > > > > > > > On Sep 18, 3:30 pm, nsheth <[email protected]> wrote: > > > > > We weren't experiencing any abnormal connection levels. > > > > > > > > I did upgrade to the latest client and server version 1.4.1. So > far > > > > > so good . . . > > > > > > > > On Sep 15, 10:36 pm, nsheth <[email protected]> wrote: > > > > > > > > > The machine isn't swapping, actually. I'll try to "catch" it > > > > > > happening next time and see if I can get more information about > the > > > > > > connections used . . . and also look into upgrading to 1.4.1, > > > > > > hopefully that helps. > > > > > > > > > On Sep 15, 6:19 pm, Vladimir <[email protected]> wrote: > > > > > > > > > > I do question whether those would actually cause load to spike > up. > > > > > > > Perhaps connection refused but I suspect those two ie. load > spike and > > > > > > > connection refused are linked. Please correct if I am wrong. I > just > > > > > > > checked my tcp_time_wait metrics and they peak around 600 even > during > > > > > > > these load spikes. > > > > > > > > > > Eric Day wrote: > > > > > > > > If you discover this is a TIME_WAIT issue (too many TCP > sockets > > > > > > > > waiting around in kernel), you can tweak this in the kernel: > > > > > > > > > > > # cat /proc/sys/net/ipv4/tcp_fin_timeout > > > > > > > > 60 > > > > > > > > > > > # cat /proc/sys/net/ipv4/ip_local_port_range > > > > > > > > 32768 61000 > > > > > > > > > > > 61000-32768= 28232 > > > > > > > > > > > (these are the defaults on Debian Linux). > > > > > > > > > > > So you only have a pool of 28232 sockets to work with, and > each will > > > > > > > > linger around for 60 seconds in a TIME_WAIT state even after > being > > > > > > > > close()d on both ends. You can increase your port range and > lower > > > > > > > > your TIME_WAIT value to buy you a larger window. Something to > keep > > > > > > > > in mind though for any clients/servers that have a high > connect rate. > > > > > > > > > > > -Eric > > > > > > > > > > > On Tue, Sep 15, 2009 at 08:48:39PM -0400, Vladimir wrote: > > > > > > > > > > >> Too many connections in CLOSE_WAIT state ? > > > > > > > > > > >> Anyways I would highly recommend installing something > like Ganglia to get > > > > > > > >> some types of metrics. > > > > > > > > > > >> Also at 35-50 machine is not doing much other than > swapping. > > > > > > > > > > >> Stephen Johnston wrote: > > > > > > > > > > >> This is a total long shot, but we spent alot of time > figuring out a > > > > > > > >> similar issue that ended up being ephemeral port > exhaustion. > > > > > > > > > > >> Stephen Johnston > > > > > > > > > > >> On Tue, Sep 15, 2009 at 8:27 PM, Vladimir < > [email protected]> wrote: > > > > > > > > > > >> nsheth wrote: > > > > > > > > > > >> About once a day, usually during peak traffic > times, I hit some > > > > > > > >> major > > > > > > > >> load issues. I'm running memached on the same > boxes as my > > > > > > > >> webservers. Load usually spikes to 35-50, and I > see the apache > > > > > > > >> error > > > > > > > >> log flooded with messages like the following: > > > > > > > > > > >> [Sun Sep 13 14:54:34 2009] [error] [client > 10.0.0.2] PHP Warning: > > > > > > > >> memcache_pconnect() [<a > href='function.memcache-pconnect'>function. > > > > > > > >> memcache-pconnect</a>]: Can't connect to > 10.0.0.5:11211, Unknown > > > > > > > >> error > > > > > > > >> (0) in /var/www/html/memcache.php on line 174, > referer: xxxx > > > > > > > > > > >> Any thoughts? Restart apache, and everything > clears up. > > > > > > > > > > >> It's PHP. I have seen something but in last couple > weeks it has > > > > > > > >> "cleared" itself. It could be coincidental with using > memcached 1.4.1, > > > > > > > >> code changes etc. I actually have some Ganglia > snapshots of the > > > > > > > >> behavior you are describing here > > > > > > > > > > >> http://2tu.us/pgr > > > > > > > > > > >> Reason why load goes to 35-50 is that Apache starts > consuming greater > > > > > > > >> and greater amounts of memory indicating a PHP memory > leak. Granted it > > > > > > > >> could also have something to do with session garbage > collection. > > > > > > > > > > >> I'm running memcached 1.2.5 currently (which looks > to be a bit out > > > > > > > >> of > > > > > > > >> date at this point, so perhaps an update is in > order). > > > > > > > > > > >> I think that would be a wise choice. > > > > > > > >> Vladimir > > > > >
