Hi Steve, On Aug 20, 7:10 pm, stevemac <[EMAIL PROTECTED]> wrote: > OK, this problem is almst undoubtedly in "Fast" but this group is > probably the most up to date on all the various things that use > memcached, so I'll post the question here. > > As part of stress testing we populate several instances of memcached > with various data then run scripts against it. > > In one test we pound 3 instances of memcached with 100,000 gets each > asfastas we can. Since we have seeded these instances we know we > shouldn't get any errors. > > We also do it in two ways. First using a single instance of > Cache::Memcached::Fastfor all operations and everything is fine. We > do it generating a new Cache::Memcached::Fastfor each query and we > get an empty/null result (as if the data wasn't there) about 3% of the > time. > > Doing exactly the same thing but using the (perl) Cache::Memcached > module instead we get 100% success, even when creating a new object > for each query.
We are running essentially the same test suite and are seeing similar results. We aren't 100% sure what's causing the problem but maybe I can share with you a few things that we've noticed. The errors only seem to come in bursts, ie they aren't randomly distributed throughout the 100k tests, but occur consecutively for example from 28000 to 29000, stop, and then again from 52000 to 53000. This led us to believe that it might have something to do with a congested network or some sort of limit on open files. We then ran the tests again and monitored the number of TCP connections on the host running the client library with: netstat -t | wc -l We noticed that whenever the error started to occur that there were a very large number of TCP connections in the state TIME_WAIT (about 16000). By adding some more logging we were then able to find out that things started to run smoothly again when this value sank to around 9000-10000. We're guessing that there is some limit of open files per process being reached on the machine running the client library that prevents the script from being able to instantiate a new instance of Cache::Memcached::Fast. Undef'ing the mc object as fast as we can in the loop seems to cause a build up of TCP connections that are not properly closed and have to be cleaned up by the OS. I can only presume that this has something to do with how the socket is being (or not being) closed. > We know this is an extreme case, and the chance of hitting these kind > of data rates in the world is exceedingly small, but has anyone > noticed this kind of thing happening? If so, is there a fix?? I don't by any means think that this is an extreme case as we first witnessed this behavior in a production environment and were then able to reproduce it in our test suite. Regards, Erick
