Re: Memcached Connection Failure

dormando Tue, 22 Sep 2009 17:51:16 -0700

Wrong;

for omg in `seq 1 30` ; do yes > /dev/null & done


observe load hit 30.

-Dormando

On Tue, 22 Sep 2009, Vladimir Vuksan wrote:

> I don't think running CPU hard would explain. You could have 100% CPU 
> utilization and load of one. Load of 35-40 is usually related to some type of 
> IO. Most cases disk IO however network IO is not out
> of question. I would suggest installing something like Ganglia to get some 
> actionable metrics. My money is on Apache consuming ever increasing amounts 
> of memory.
>
> dormando wrote:
>
> Can you troubleshoot it more carefully without thinking it's specific to
> memcached? How'd you track it down to memcached in the first place?
>
> When your load is spiking, what requests are hitting your server? Can you
> look at an apache server-status page to see what's in flight, or
> re-assemble such a view from the logs?
>
> It smells like you're getting a short flood of traffic. If you can see
> what type of traffic you're getting at the time of the load spike you can
> reproduce it yourself... Load the page yourself, time how long it takes to
> render, then break it down and see what it's doing.
>
> If it's related to memcached, it's still likely to be a bug in how you're
> using it internally (looping wrong, or something) - since your load is
> related to the number of apache procs, and you claim it's not swapping,
> it's either doing disk io or running CPU hard.
>
> -Dormando
>
> On Tue, 22 Sep 2009, nsheth wrote:
>
>
>
> Hmm, just saw the same issue occur again.  Load spiked to 35-40.
> (I've set MaxClients to 40 in apache, and looking at the status page,
> I see it basically using every thread, so that may explain that load
> level).
>
> Going back on the connections, it looks like we've got about 1.2k
> connections in various states, so nowhere near any of these limits.
>
> Any other thoughts?
>
> Thanks!
>
> On Sep 18, 3:30 pm, nsheth <[email protected]> wrote:
>
>
> We weren't experiencing any abnormal connection levels.
>
> I did upgrade to the latest client and server version 1.4.1.  So far
> so good . . .
>
> On Sep 15, 10:36 pm, nsheth <[email protected]> wrote:
>
>
>
> The machine isn't swapping, actually.  I'll try to "catch" it
> happening next time and see if I can get more information about the
> connections used . . . and also look into upgrading to 1.4.1,
> hopefully that helps.
>
>
> On Sep 15, 6:19 pm, Vladimir <[email protected]> wrote:
>
>
> I do question whether those would actually cause load to spike up.
> Perhaps connection refused but I suspect those two ie. load spike and
> connection refused are linked. Please correct if I am wrong. I just
> checked my tcp_time_wait metrics and they peak around 600 even during
> these load spikes.
>
>
> Eric Day wrote:
>
>
> If you discover this is a TIME_WAIT issue (too many TCP sockets
> waiting around in kernel), you can tweak this in the kernel:
>
>
> # cat /proc/sys/net/ipv4/tcp_fin_timeout
> 60
>
>
> # cat /proc/sys/net/ipv4/ip_local_port_range
> 32768   61000
>
>
> 61000-32768= 28232
>
>
> (these are the defaults on Debian Linux).
>
>
> So you only have a pool of 28232 sockets to work with, and each will
> linger around for 60 seconds in a TIME_WAIT state even after being
> close()d on both ends. You can increase your port range and lower
> your TIME_WAIT value to buy you a larger window. Something to keep
> in mind though for any clients/servers that have a high connect rate.
>
>
> -Eric
>
>
> On Tue, Sep 15, 2009 at 08:48:39PM -0400, Vladimir wrote:
>
>
>    Too many connections in CLOSE_WAIT state ?
>
>
>    Anyways I would highly recommend installing something like Ganglia to get
>    some types of metrics.
>
>
>    Also at 35-50 machine is not doing much other than swapping.
>
>
>    Stephen Johnston wrote:
>
>
>      This is a total long shot, but we spent alot of time figuring out a
>      similar issue that ended up being ephemeral port exhaustion.
>
>
>      Stephen Johnston
>
>
>      On Tue, Sep 15, 2009 at 8:27 PM, Vladimir <[email protected]> wrote:
>
>
>        nsheth wrote:
>
>
>          About once a day, usually during peak traffic times, I hit some
>          major
>          load issues.  I'm running memached on the same boxes as my
>          webservers.  Load usually spikes to 35-50, and I see the apache
>          error
>          log flooded with messages like the following:
>
>
>          [Sun Sep 13 14:54:34 2009] [error] [client 10.0.0.2] PHP Warning:
>          memcache_pconnect() [<a href='function.memcache-pconnect'>function.
>          memcache-pconnect</a>]: Can't connect to 10.0.0.5:11211, Unknown
>          error
>          (0) in /var/www/html/memcache.php on line 174, referer: xxxx
>
>
>          Any thoughts?  Restart apache, and everything clears up.
>
>
>        It's PHP. I have seen something but in last couple weeks it has
>        "cleared" itself. It could be coincidental with using memcached 1.4.1,
>        code changes etc. I actually have some Ganglia snapshots of the
>        behavior you are describing here
>
>
>        http://2tu.us/pgr
>
>
>        Reason why load goes to 35-50 is that Apache starts consuming greater
>        and greater amounts of memory indicating a PHP memory leak. Granted it
>        could also have something to do with session garbage collection.
>
>
>          I'm running memcached 1.2.5 currently (which looks to be a bit out
>          of
>          date at this point, so perhaps an update is in order).
>
>
>        I think that would be a wise choice.
>        Vladimir
>
>
> >
>
>
>
>

Re: Memcached Connection Failure

Reply via email to