Hi Steve,

On Aug 20, 7:10 pm, stevemac <[EMAIL PROTECTED]> wrote:
> OK, this problem is almst undoubtedly in "Fast" but this group is
> probably the most up to date on all the various things that use
> memcached, so I'll post the question here.
>
> As part of stress testing we populate several instances of memcached
> with various data then run scripts against it.
>
> In one test we pound 3 instances of memcached with 100,000 gets each
> asfastas we can. Since we have seeded these instances we know we
> shouldn't get any errors.
>
> We also do it in two ways. First using a single instance of
> Cache::Memcached::Fastfor all operations and everything is fine. We
> do it generating a new Cache::Memcached::Fastfor each query and we
> get an empty/null result (as if the data wasn't there) about 3% of the
> time.
>
> Doing exactly the same thing but using the (perl) Cache::Memcached
> module instead we get 100% success, even when creating a new object
> for each query.

We are running essentially the same test suite and are seeing similar
results.
We aren't 100% sure what's causing the problem but maybe I can share
with
you a few things that we've noticed.  The errors only seem to come in
bursts, ie
they aren't randomly distributed throughout the 100k tests, but occur
consecutively
for example from 28000 to 29000, stop, and then again from 52000 to
53000.  This
led us to believe that it might have something to do with a congested
network or
some sort of limit on open files.  We then ran the tests again and
monitored the number
of TCP connections on the host running the client library with:

netstat -t | wc -l

We noticed that whenever the error started to occur that there were a
very large number
of TCP connections in the state TIME_WAIT (about 16000).  By adding
some more logging we were
then able to find out that things started to run smoothly again when
this value sank to around 9000-10000.
We're guessing that there is some limit of open files per process
being reached on the machine
running the client library that prevents the script from being able to
instantiate a new instance of
Cache::Memcached::Fast.  Undef'ing the mc object as fast as we can in
the loop seems to cause a
build up of TCP connections that are not properly closed and have to
be cleaned up by the OS.  I can
only presume that this has something to do with how the socket is
being (or not being) closed.

> We know this is an extreme case, and the chance of hitting these kind
> of data rates in the world is exceedingly small, but has anyone
> noticed this kind of thing happening? If so, is there a fix??

I don't by any means think that this is an extreme case as we first
witnessed this behavior
in a production environment and were then able to reproduce it in our
test suite.

Regards,

Erick

Reply via email to