+1 On Tue, Jul 6, 2010 at 2:47 PM, dormando <[email protected]> wrote:
> Or you could disable the "failover" feature... > > On Tue, 6 Jul 2010, Darryl Kuhn wrote: > > > FYI - we made the change on one server and it does appear to have > resolved premature key expiration. > > > > Effectively what appears to have been happening was that every so often a > client was unable to connect to one or more of the memcached servers. When > this happened it changed the key distribution. Because > > the connection was persistent it meant that subsequent requests would use > the same connection handle with the reduced server pool. Turning off > persistent connections ensures that a if we are unable to > > connect to a server in one instance the failure does not persist for > subsequent connections. > > > > We'll be rolling this change out to the entire server pool and I'll give > the list another update with our findings. > > > > Thanks, > > Darryl > > > > On Fri, Jul 2, 2010 at 8:34 AM, Darryl Kuhn <[email protected]> > wrote: > > Found the reset call - that was me being an idiot (I actually > introduced it when I added logging to debug this issue)... That's been > removed however there was no flush command. Somebody else > > suggested it may have to do with the fact that we're running > persistent connections; and that if a failure occurred that failure would > persist and alter hashing rules for subsequent requests on > > that connection. I do see a limited number of connection failures > (~5-15) throughout the day. I'm going to alter the config to make > connections non-persistent and see if it makes a difference > > (however I'm doubtful this is the issue as we've run with memcache > server pools with a single instance - which would make it impossible to > alter the hashing distribution). > > > > I'll report back what I find - thanks for your continued input! > > > > -Darryl > > > > > > On Thu, Jul 1, 2010 at 12:28 PM, dormando <[email protected]> wrote: > > > Dormando... Thanks for the response. I've moved one of our > servers to use an upgraded version running 1.4.5. Couple of things: > > > * I turned on logging last night > > > * I'm only running -vv at the moment; -vvv generated way more > logging than we could handle. As it stands we've generated ~6GB of logs > since last night (using -vv). I'm looking at ways > > of reducing log > > > volume by logging only specific data or perhaps standing up > 10 or 20 instances on one machine (using multiple ports) and turning on -vvv > on only one instance. Any suggestions there? > > > > Oh. I thought given your stats output that you had reproduced it on a > > server that was on a dev instance or local machine... but I guess that's > > related to below. Running logs on a production instance with a lot of > > traffic isn't that great of an idea, sorry about that :/ > > > > > Looking at the logs two things jump out at me. > > > * While I had -vvv turned on I saw "stats reset" command being issued > constantly (at least once a second). Nothing in the code that we have does > this - do you know if the PHP client does > > this perhaps? Is > > > this something you've seen in the past? > > > > No, you probably have some code that's doing something intensely wrong. > > Now we should probably add a counter for the number of times a "stats > > reset" has been called... > > > > > * Second with -vv on I get something like this: > > > + <71 get resourceCategoryPath21:984097: > > > >71 sending key resourceCategoryPath21:984097: > > > >71 END > > > <71 set > > > popularProducts:2010-06-28:skinit.com:styleskins:en::2000:image_wall:0__type > 0 86400 5 > > > >71 STORED > > > <71 set > > > popularProducts:2010-06-28:skinit.com:styleskins:en::2000:image_wall:0 > 1 86400 130230 > > > <59 get domain_host:www.bestbuyskins.com > > > >59 sending key domain_host:www.bestbuyskins.com > > > >59 END > > > * Two questions on the output - what's the "71" and "59"? Second - I > would have thought I'd see an "END" after each "get" and "set" however you > can see that's not the case. > > > > > > Last question... other than trolling through code is there a good place > to go to understand how to parse out these log files (I'd prefer to > self-help rather than bugging you)? > > > > Looks ike you figured that out. The numbers are the file descriptors > > (connections). END/STORED/etc are the responses. > > > > Honestly I'm going to take a wild guess that something on your end is > > constantly trying to reset the memcached instance.. it's probably doing a > > "flush_all" then a "stats reset" which would hide the flush counter. Do > > you see "flush_all" being called in the logs anywhere? > > > > Go find where you're calling stats reset and make it stop... that'll > > probably help bubble up what the real problem is. > > > > > > > > > > > -- awl
