HI Zhiwei,

IT seems that some sort of event takes place, and from that point forward 
all (or most, hard to say) keys return seemingly arbitrary results.  This 
persists till memcached is flushed or restarted.

I'll try the stats cachedump approach recommended and see what that yields. 
 

Thanks for the tip!

Mike

On Wednesday, November 19, 2014 11:05:45 PM UTC-4, Zhiwei Chan wrote:
>
> Is the key got the wrong value always? or just sometime it is wrong?  If 
> it always got wrong value, the wrong value trend to be wrong when it is 
> stored(or be  overwrote); If it is just randomly  got the wrong value, it 
> should be corruption at get command. 
>   For the first case, you can use " stats cachedump" command to check if 
> the key is correct. If the key is correct and you know how to use gdb, get 
> a core file and check whether the value of the key is correct. check the 
> "refcount" of the key, it should be 1 most of the time. If all of the 
> wrong-key's "refcount" is greater than 1, that may be a leak of refcount, 
> some bugs in old version could cause this issue and it is fixed at latest 
> version. And I suggest you upgrade. 
>   For the second case, I think it should crash, not just got the wrong 
> value of the most of the keys. 
>   And I suggest you upgrade memcached too if you just don't know how to 
> use gdb. It is the first time I know about such kind of issue(I use 1.4.15 
> and 1.4.20 in a big cache cluster). 
>   Good luck. 
>
> -----Original Message----- 
> From: [email protected] <javascript:> [mailto:
> [email protected] <javascript:>] On Behalf Of dormando 
> Sent: Thursday, November 20, 2014 9:09 AM 
> To: [email protected] <javascript:> 
> Subject: Re: Diagnosing Corruption 
>
> You're probably getting spaces or newlines into your keys, which can cause 
> the client protocol to desync with the server. Then you'll get all sorts of 
> junk into random keys (or random responses from keys which're fine). 
>
> Either filtering those or using the binary protocol should fix that for 
> you. 
>
> On Wed, 19 Nov 2014, [email protected] <javascript:> wrote: 
>
> > Hi Boris, 
> > I think I may have mislead you.  It is not one or two keys that get 
> corrupted, it seems that most (if not all) keys fetched return incorrect 
> data. 
> >  For example during one of these failures (just this morning), a 
> > session key (prefixed with session_) returned an array related to a 
> customer record (prefixed with lab_), a key related to a customer return a 
> string related to a translation, and a key related to a translation 
> returned.... 
> > 
> > All heck breaks loose (seemingly) across all keys.  A flush brings 
> things back into the fold. 
> > 
> > Make sense? 
> > 
> > Thanks, 
> > 
> > Mike 
> > 
> > 
> > On Wednesday, November 19, 2014 2:22:50 PM UTC-4, Boris wrote: 
> >       I can think of many ways to screw up an application in a way that 
> you describe. Simple programmer error can lead to this sort of 
> >       behavior. I'd just log every time you do a set for that key with 
> value type you are setting. 
> > 
> >       On Wed, Nov 19, 2014 at 1:00 PM, <[email protected]> wrote: 
> >             Thanks Boris, 
> > I haven't really given that much thought.  Out of curiosity, why do 
> > you think the issue might be on the client end?  I ask, cause I really 
> don't have a sense of what to look for on that end and wonder if you might 
> have some suggestions. 
> > 
> > Best, 
> > 
> > Mike 
> > 
> > 
> > On Wednesday, November 19, 2014 12:46:16 PM UTC-4, Boris wrote: 
> >       Hi Mike, this sounds to me more like a client/coding error rather 
> than memcached server. That's where I would focus first. 
> > Boris 
> > 
> > On Wed, Nov 19, 2014 at 11:41 AM, <[email protected]> wrote: 
> >       I just had another failure.  After pulling down my apache web 
> servers, and before restarting memcached I grabbed 
> >       stats to see if they showed anything of interest: 
> >  - All 3 servers were reporting for duty following a getServerStatus 
> > (PHP client call) 
> >  - curr_connections were listed as 8 across all the instances (apache 
> > was down but cron jobs up, so that would have dropped things down 
> > considerably) 
> >  - listen_disabled_num was listed as 0 across all the instances 
> >  - accepting_conns was listed as 1 across all the instances 
> >  - evictions listed as 0 
> >  - All items across all instances had an evicted and evicted_nonzero 
> > and evicted_time value of 0 
> >  - All slabs across all instances had a total_pages value of 1 
> >  - tailrepairs and outofmemory is listed with a value of 0 across all 
> > items in each instance 
> >  - global hit rate is 0.9937 
> >  - get_hits is always* greater than cmd_set on a per slab basis.  *One 
> > slab reported both values as equal 
> > 
> > 
> > As far as I can tell, memcache is reporting that the world is fine and 
> > dandy.  Should I be enlarging scope of the search to look at OS related 
> factors that could result in the client receiving bad data?  None of the 
> machines are dipping into swap. 
> > 
> > Thanks, 
> > 
> > Mike 
> > 
> > 
> > 
> > On Wednesday, November 19, 2014 9:35:19 AM UTC-4, [email protected] 
> wrote: 
> >       For what it is worth, I'm hesitant to upgrade memcached to the 
> latest version as a step to try and solve this 
> >       issue.  It seems to me that since our installs have been running 
> without issue for quite some time (close to a 
> >       year), that there are other variables at play here.  I just 
> > don't understand the variables.  ;) Thanks, 
> > 
> > Mike 
> > 
> > 
> > On Tuesday, November 18, 2014 2:00:46 PM UTC-4, [email protected] 
> wrote: 
> >       Hi There, 
> > I'm trying to diagnose a new problem with Memcache that seems to be 
> > happening with greater frequency.  The issue has to do with memcache get 
> requests returning incorrect responses (data from from other keys 
> returned). 
> > Restarting or flushing the servers seems to resolve the issue. 
> > 
> > Do any memcache veterans have any suggestions of how I might dig into 
> > this issue?  Stats that I might want to trace, log files to look at, 
> etc?  Does maybe this symptom fit the description of any known issues? 
> > 
> > I'm keeping a casual eye on on curr_connections, listen_disabled_num, 
> > accepting_conns, bytes, and limit_maxbytes (all show nothing unusual).   
> > I've verified that all servers and clients are set up in a consistent 
> fashion.  I'm not sure where to go from here to better understand the 
> problem. 
> > 
> > 
> > If it helps, I'm running 1.4.13 (ubuntu 12.04 LTS) across 3 servers, 
> > connecting in with PHP Memcache 3.0.6 
> > 
> > 
> > Tips? 
> > 
> > Mike 
> > 
> > 
> > 
> >   
> > 
> > -- 
> > 
> > --- 
> > You received this message because you are subscribed to the Google 
> Groups "memcached" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to [email protected]. 
> > For more options, visit https://groups.google.com/d/optout. 
> > 
> > 
> > -- 
> > 
> > --- 
> > You received this message because you are subscribed to the Google 
> Groups "memcached" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to [email protected]. 
> > For more options, visit https://groups.google.com/d/optout. 
> > 
> > 
> > -- 
> > 
> > --- 
> > You received this message because you are subscribed to the Google 
> Groups "memcached" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to [email protected] <javascript:>. 
> > For more options, visit https://groups.google.com/d/optout. 
> > 
> > 
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to