I've only got 2 servers hitting this.  Currently, the connection limit is
set to 1024, but I can increase that.

I'm running now.  Looking at memcached stats, the value of
listen_disabled_num is 0.

My pecl/memcache library is 2.2.5, latest stable.

VLadmir, I do have a cacti installation.  Looking at that, I see a cpu peak
at that time, but that may just be a result of having 40 apache threads
actively churning?



On Tue, Sep 22, 2009 at 5:53 PM, dormando <[email protected]> wrote:

> Okay,
>
> Smells like you're leaking memcached connections objects somewhere, or you
> have a ton of servers? During these spikes, can you telnet to memcached
> and run the 'stats' command, or can you not connect either?
>
> Try restarting memcached with -c (connection limit) set to 32767 or
> somesuch. See if that changes things.
>
> Is your pecl/memcache library fully upgraded?
>
> If you're using memcached 1.2.8 or later the 'stats' output has a value
> 'listen_disabled_num' - if that value is nonzero, or incrementing, you're
> hitting the connection limit on memcached.
>
> On Tue, 22 Sep 2009, nsheth wrote:
>
> >
> > I've already looked in some detail at that, but haven't been able to
> > discern any real pattern.  I'll look again, though.
> >
> > I suspect memcache, as whenever I experience this, I get a flood of
> > messages in my error log like:
> >
> > [Sun Sep 13 14:54:34 2009] [error] [client 10.0.0.2] PHP Warning:
> > memcache_pconnect() [<a href='function.memcache-pconnect'>function.
> > memcache-pconnect</a>]: Can't connect to 10.0.0.5:11211, Unknown error
> > (0) in /var/www/html/memcache.php on line 174, referer: xxxx
> >
> > On Sep 22, 5:31 pm, dormando <[email protected]> wrote:
> > > Hey,
> > >
> > > Can you troubleshoot it more carefully without thinking it's specific
> to
> > > memcached? How'd you track it down to memcached in the first place?
> > >
> > > When your load is spiking, what requests are hitting your server? Can
> you
> > > look at an apache server-status page to see what's in flight, or
> > > re-assemble such a view from the logs?
> > >
> > > It smells like you're getting a short flood of traffic. If you can see
> > > what type of traffic you're getting at the time of the load spike you
> can
> > > reproduce it yourself... Load the page yourself, time how long it takes
> to
> > > render, then break it down and see what it's doing.
> > >
> > > If it's related to memcached, it's still likely to be a bug in how
> you're
> > > using it internally (looping wrong, or something) - since your load is
> > > related to the number of apache procs, and you claim it's not swapping,
> > > it's either doing disk io or running CPU hard.
> > >
> > > -Dormando
> > >
> > > On Tue, 22 Sep 2009, nsheth wrote:
> > >
> > > > Hmm, just saw the same issue occur again.  Load spiked to 35-40.
> > > > (I've set MaxClients to 40 in apache, and looking at the status page,
> > > > I see it basically using every thread, so that may explain that load
> > > > level).
> > >
> > > > Going back on the connections, it looks like we've got about 1.2k
> > > > connections in various states, so nowhere near any of these limits.
> > >
> > > > Any other thoughts?
> > >
> > > > Thanks!
> > >
> > > > On Sep 18, 3:30 pm, nsheth <[email protected]> wrote:
> > > > > We weren't experiencing any abnormal connection levels.
> > >
> > > > > I did upgrade to the latest client and server version 1.4.1.  So
> far
> > > > > so good . . .
> > >
> > > > > On Sep 15, 10:36 pm, nsheth <[email protected]> wrote:
> > >
> > > > > > The machine isn't swapping, actually.  I'll try to "catch" it
> > > > > > happening next time and see if I can get more information about
> the
> > > > > > connections used . . . and also look into upgrading to 1.4.1,
> > > > > > hopefully that helps.
> > >
> > > > > > On Sep 15, 6:19 pm, Vladimir <[email protected]> wrote:
> > >
> > > > > > > I do question whether those would actually cause load to spike
> up.
> > > > > > > Perhaps connection refused but I suspect those two ie. load
> spike and
> > > > > > > connection refused are linked. Please correct if I am wrong. I
> just
> > > > > > > checked my tcp_time_wait metrics and they peak around 600 even
> during
> > > > > > > these load spikes.
> > >
> > > > > > > Eric Day wrote:
> > > > > > > > If you discover this is a TIME_WAIT issue (too many TCP
> sockets
> > > > > > > > waiting around in kernel), you can tweak this in the kernel:
> > >
> > > > > > > > # cat /proc/sys/net/ipv4/tcp_fin_timeout
> > > > > > > > 60
> > >
> > > > > > > > # cat /proc/sys/net/ipv4/ip_local_port_range
> > > > > > > > 32768   61000
> > >
> > > > > > > > 61000-32768= 28232
> > >
> > > > > > > > (these are the defaults on Debian Linux).
> > >
> > > > > > > > So you only have a pool of 28232 sockets to work with, and
> each will
> > > > > > > > linger around for 60 seconds in a TIME_WAIT state even after
> being
> > > > > > > > close()d on both ends. You can increase your port range and
> lower
> > > > > > > > your TIME_WAIT value to buy you a larger window. Something to
> keep
> > > > > > > > in mind though for any clients/servers that have a high
> connect rate.
> > >
> > > > > > > > -Eric
> > >
> > > > > > > > On Tue, Sep 15, 2009 at 08:48:39PM -0400, Vladimir wrote:
> > >
> > > > > > > >>    Too many connections in CLOSE_WAIT state ?
> > >
> > > > > > > >>    Anyways I would highly recommend installing something
> like Ganglia to get
> > > > > > > >>    some types of metrics.
> > >
> > > > > > > >>    Also at 35-50 machine is not doing much other than
> swapping.
> > >
> > > > > > > >>    Stephen Johnston wrote:
> > >
> > > > > > > >>      This is a total long shot, but we spent alot of time
> figuring out a
> > > > > > > >>      similar issue that ended up being ephemeral port
> exhaustion.
> > >
> > > > > > > >>      Stephen Johnston
> > >
> > > > > > > >>      On Tue, Sep 15, 2009 at 8:27 PM, Vladimir <
> [email protected]> wrote:
> > >
> > > > > > > >>        nsheth wrote:
> > >
> > > > > > > >>          About once a day, usually during peak traffic
> times, I hit some
> > > > > > > >>          major
> > > > > > > >>          load issues.  I'm running memached on the same
> boxes as my
> > > > > > > >>          webservers.  Load usually spikes to 35-50, and I
> see the apache
> > > > > > > >>          error
> > > > > > > >>          log flooded with messages like the following:
> > >
> > > > > > > >>          [Sun Sep 13 14:54:34 2009] [error] [client
> 10.0.0.2] PHP Warning:
> > > > > > > >>          memcache_pconnect() [<a
> href='function.memcache-pconnect'>function.
> > > > > > > >>          memcache-pconnect</a>]: Can't connect to
> 10.0.0.5:11211, Unknown
> > > > > > > >>          error
> > > > > > > >>          (0) in /var/www/html/memcache.php on line 174,
> referer: xxxx
> > >
> > > > > > > >>          Any thoughts?  Restart apache, and everything
> clears up.
> > >
> > > > > > > >>        It's PHP. I have seen something but in last couple
> weeks it has
> > > > > > > >>        "cleared" itself. It could be coincidental with using
> memcached 1.4.1,
> > > > > > > >>        code changes etc. I actually have some Ganglia
> snapshots of the
> > > > > > > >>        behavior you are describing here
> > >
> > > > > > > >>        http://2tu.us/pgr
> > >
> > > > > > > >>        Reason why load goes to 35-50 is that Apache starts
> consuming greater
> > > > > > > >>        and greater amounts of memory indicating a PHP memory
> leak. Granted it
> > > > > > > >>        could also have something to do with session garbage
> collection.
> > >
> > > > > > > >>          I'm running memcached 1.2.5 currently (which looks
> to be a bit out
> > > > > > > >>          of
> > > > > > > >>          date at this point, so perhaps an update is in
> order).
> > >
> > > > > > > >>        I think that would be a wise choice.
> > > > > > > >>        Vladimir
> >
> >
>

Reply via email to