Wrong; for omg in `seq 1 30` ; do yes > /dev/null & done
observe load hit 30. -Dormando On Tue, 22 Sep 2009, Vladimir Vuksan wrote: > I don't think running CPU hard would explain. You could have 100% CPU > utilization and load of one. Load of 35-40 is usually related to some type of > IO. Most cases disk IO however network IO is not out > of question. I would suggest installing something like Ganglia to get some > actionable metrics. My money is on Apache consuming ever increasing amounts > of memory. > > dormando wrote: > > Can you troubleshoot it more carefully without thinking it's specific to > memcached? How'd you track it down to memcached in the first place? > > When your load is spiking, what requests are hitting your server? Can you > look at an apache server-status page to see what's in flight, or > re-assemble such a view from the logs? > > It smells like you're getting a short flood of traffic. If you can see > what type of traffic you're getting at the time of the load spike you can > reproduce it yourself... Load the page yourself, time how long it takes to > render, then break it down and see what it's doing. > > If it's related to memcached, it's still likely to be a bug in how you're > using it internally (looping wrong, or something) - since your load is > related to the number of apache procs, and you claim it's not swapping, > it's either doing disk io or running CPU hard. > > -Dormando > > On Tue, 22 Sep 2009, nsheth wrote: > > > > Hmm, just saw the same issue occur again. Load spiked to 35-40. > (I've set MaxClients to 40 in apache, and looking at the status page, > I see it basically using every thread, so that may explain that load > level). > > Going back on the connections, it looks like we've got about 1.2k > connections in various states, so nowhere near any of these limits. > > Any other thoughts? > > Thanks! > > On Sep 18, 3:30 pm, nsheth <[email protected]> wrote: > > > We weren't experiencing any abnormal connection levels. > > I did upgrade to the latest client and server version 1.4.1. So far > so good . . . > > On Sep 15, 10:36 pm, nsheth <[email protected]> wrote: > > > > The machine isn't swapping, actually. I'll try to "catch" it > happening next time and see if I can get more information about the > connections used . . . and also look into upgrading to 1.4.1, > hopefully that helps. > > > On Sep 15, 6:19 pm, Vladimir <[email protected]> wrote: > > > I do question whether those would actually cause load to spike up. > Perhaps connection refused but I suspect those two ie. load spike and > connection refused are linked. Please correct if I am wrong. I just > checked my tcp_time_wait metrics and they peak around 600 even during > these load spikes. > > > Eric Day wrote: > > > If you discover this is a TIME_WAIT issue (too many TCP sockets > waiting around in kernel), you can tweak this in the kernel: > > > # cat /proc/sys/net/ipv4/tcp_fin_timeout > 60 > > > # cat /proc/sys/net/ipv4/ip_local_port_range > 32768 61000 > > > 61000-32768= 28232 > > > (these are the defaults on Debian Linux). > > > So you only have a pool of 28232 sockets to work with, and each will > linger around for 60 seconds in a TIME_WAIT state even after being > close()d on both ends. You can increase your port range and lower > your TIME_WAIT value to buy you a larger window. Something to keep > in mind though for any clients/servers that have a high connect rate. > > > -Eric > > > On Tue, Sep 15, 2009 at 08:48:39PM -0400, Vladimir wrote: > > > Too many connections in CLOSE_WAIT state ? > > > Anyways I would highly recommend installing something like Ganglia to get > some types of metrics. > > > Also at 35-50 machine is not doing much other than swapping. > > > Stephen Johnston wrote: > > > This is a total long shot, but we spent alot of time figuring out a > similar issue that ended up being ephemeral port exhaustion. > > > Stephen Johnston > > > On Tue, Sep 15, 2009 at 8:27 PM, Vladimir <[email protected]> wrote: > > > nsheth wrote: > > > About once a day, usually during peak traffic times, I hit some > major > load issues. I'm running memached on the same boxes as my > webservers. Load usually spikes to 35-50, and I see the apache > error > log flooded with messages like the following: > > > [Sun Sep 13 14:54:34 2009] [error] [client 10.0.0.2] PHP Warning: > memcache_pconnect() [<a href='function.memcache-pconnect'>function. > memcache-pconnect</a>]: Can't connect to 10.0.0.5:11211, Unknown > error > (0) in /var/www/html/memcache.php on line 174, referer: xxxx > > > Any thoughts? Restart apache, and everything clears up. > > > It's PHP. I have seen something but in last couple weeks it has > "cleared" itself. It could be coincidental with using memcached 1.4.1, > code changes etc. I actually have some Ganglia snapshots of the > behavior you are describing here > > > http://2tu.us/pgr > > > Reason why load goes to 35-50 is that Apache starts consuming greater > and greater amounts of memory indicating a PHP memory leak. Granted it > could also have something to do with session garbage collection. > > > I'm running memcached 1.2.5 currently (which looks to be a bit out > of > date at this point, so perhaps an update is in order). > > > I think that would be a wise choice. > Vladimir > > > > > > > >
