Hey, Can you troubleshoot it more carefully without thinking it's specific to memcached? How'd you track it down to memcached in the first place?
When your load is spiking, what requests are hitting your server? Can you look at an apache server-status page to see what's in flight, or re-assemble such a view from the logs? It smells like you're getting a short flood of traffic. If you can see what type of traffic you're getting at the time of the load spike you can reproduce it yourself... Load the page yourself, time how long it takes to render, then break it down and see what it's doing. If it's related to memcached, it's still likely to be a bug in how you're using it internally (looping wrong, or something) - since your load is related to the number of apache procs, and you claim it's not swapping, it's either doing disk io or running CPU hard. -Dormando On Tue, 22 Sep 2009, nsheth wrote: > > Hmm, just saw the same issue occur again. Load spiked to 35-40. > (I've set MaxClients to 40 in apache, and looking at the status page, > I see it basically using every thread, so that may explain that load > level). > > Going back on the connections, it looks like we've got about 1.2k > connections in various states, so nowhere near any of these limits. > > Any other thoughts? > > Thanks! > > On Sep 18, 3:30 pm, nsheth <[email protected]> wrote: > > We weren't experiencing any abnormal connection levels. > > > > I did upgrade to the latest client and server version 1.4.1. So far > > so good . . . > > > > On Sep 15, 10:36 pm, nsheth <[email protected]> wrote: > > > > > The machine isn't swapping, actually. I'll try to "catch" it > > > happening next time and see if I can get more information about the > > > connections used . . . and also look into upgrading to 1.4.1, > > > hopefully that helps. > > > > > On Sep 15, 6:19 pm, Vladimir <[email protected]> wrote: > > > > > > I do question whether those would actually cause load to spike up. > > > > Perhaps connection refused but I suspect those two ie. load spike and > > > > connection refused are linked. Please correct if I am wrong. I just > > > > checked my tcp_time_wait metrics and they peak around 600 even during > > > > these load spikes. > > > > > > Eric Day wrote: > > > > > If you discover this is a TIME_WAIT issue (too many TCP sockets > > > > > waiting around in kernel), you can tweak this in the kernel: > > > > > > > # cat /proc/sys/net/ipv4/tcp_fin_timeout > > > > > 60 > > > > > > > # cat /proc/sys/net/ipv4/ip_local_port_range > > > > > 32768 61000 > > > > > > > 61000-32768= 28232 > > > > > > > (these are the defaults on Debian Linux). > > > > > > > So you only have a pool of 28232 sockets to work with, and each will > > > > > linger around for 60 seconds in a TIME_WAIT state even after being > > > > > close()d on both ends. You can increase your port range and lower > > > > > your TIME_WAIT value to buy you a larger window. Something to keep > > > > > in mind though for any clients/servers that have a high connect rate. > > > > > > > -Eric > > > > > > > On Tue, Sep 15, 2009 at 08:48:39PM -0400, Vladimir wrote: > > > > > > >> Too many connections in CLOSE_WAIT state ? > > > > > > >> Anyways I would highly recommend installing something like > > > > >> Ganglia to get > > > > >> some types of metrics. > > > > > > >> Also at 35-50 machine is not doing much other than swapping. > > > > > > >> Stephen Johnston wrote: > > > > > > >> This is a total long shot, but we spent alot of time figuring > > > > >> out a > > > > >> similar issue that ended up being ephemeral port exhaustion. > > > > > > >> Stephen Johnston > > > > > > >> On Tue, Sep 15, 2009 at 8:27 PM, Vladimir <[email protected]> > > > > >> wrote: > > > > > > >> nsheth wrote: > > > > > > >> About once a day, usually during peak traffic times, I hit > > > > >> some > > > > >> major > > > > >> load issues. I'm running memached on the same boxes as my > > > > >> webservers. Load usually spikes to 35-50, and I see the > > > > >> apache > > > > >> error > > > > >> log flooded with messages like the following: > > > > > > >> [Sun Sep 13 14:54:34 2009] [error] [client 10.0.0.2] PHP > > > > >> Warning: > > > > >> memcache_pconnect() [<a > > > > >> href='function.memcache-pconnect'>function. > > > > >> memcache-pconnect</a>]: Can't connect to 10.0.0.5:11211, > > > > >> Unknown > > > > >> error > > > > >> (0) in /var/www/html/memcache.php on line 174, referer: xxxx > > > > > > >> Any thoughts? Restart apache, and everything clears up. > > > > > > >> It's PHP. I have seen something but in last couple weeks it > > > > >> has > > > > >> "cleared" itself. It could be coincidental with using > > > > >> memcached 1.4.1, > > > > >> code changes etc. I actually have some Ganglia snapshots of > > > > >> the > > > > >> behavior you are describing here > > > > > > >> http://2tu.us/pgr > > > > > > >> Reason why load goes to 35-50 is that Apache starts consuming > > > > >> greater > > > > >> and greater amounts of memory indicating a PHP memory leak. > > > > >> Granted it > > > > >> could also have something to do with session garbage > > > > >> collection. > > > > > > >> I'm running memcached 1.2.5 currently (which looks to be a > > > > >> bit out > > > > >> of > > > > >> date at this point, so perhaps an update is in order). > > > > > > >> I think that would be a wise choice. > > > > >> Vladimir >
