Re: tail repair issue (1.4.20)

Denis Samoylov Wed, 02 Jul 2014 12:38:21 -0700

Zhiwei,

thank you for the info. But i still not sure that this relates to hash 
table grow (see my answer to Dormando in this thread) and it happened for 3 
hours time and disappear... Or I miss this part of code (do_item_alloc is 
small but with fancy idea :) )?


-denis

On Tuesday, July 1, 2014 10:24:20 PM UTC-7, Zhiwei Chan wrote:
>
>  hi,  
>    i think it the same bug with issue#370, and i have found the reproduce 
> way and pull a fix patch to github.
>
> 在 2014年7月2日星期三UTC+8上午5时43分49秒，Dormando写道：
>>
>> Hey, 
>>
>> Can you presize the hash table? (-o hashpower=nn) to be large enough on 
>> those servers such that hash expansion won't happen at runtime? You can 
>> see what hashpower is on a long running server via stats to know what to 
>> set the value to. 
>>
>> If that helps, we might still have a bug in hash expansion. I see someone 
>> finally reproduced a possible issue there under .20. .17/.19 fix other 
>> causes of the problem pretty thoroughly though. 
>>
>> On Tue, 1 Jul 2014, Denis Samoylov wrote: 
>>
>> > Hi, 
>> > We had sporadic memory corruption due tail repair in pre .20 version. 
>> So we updated some our servers to .20. This Monday we observed several 
>> > crushes in .15 version and tons of "allocation failure" in .20 version. 
>> This is expected as .20 just disables "tail repair" but it seems the 
>> > problem is still there. What is interesting: 
>> > 1) there is no visible change in traffic and only one slab is affected 
>> usually.  
>> > 2) this always happens with several but not all servers :) 
>> > 
>> > Is there any way to catch this and help with debug? I have all slab and 
>> item stats for the time around incident for .15 and .20 version. .15 is 
>> > clearly memory corruption: gdb shows that hash function returned 0 
>> (line 115 uint32_t hv = hash(ITEM_key(search), search->nkey, 0);). 
>> > 
>> > so we seems hitting this comment: 
>> >             /* Old rare bug could cause a refcount leak. We haven't 
>> seen 
>> >              * it in years, but we leave this code in to prevent 
>> failures 
>> >              * just in case */ 
>> > 
>> > :) 
>> > 
>> > Thank you, 
>> > Denis 
>> > 
>> > -- 
>> > 
>> > --- 
>> > You received this message because you are subscribed to the Google 
>> Groups "memcached" group. 
>> > To unsubscribe from this group and stop receiving emails from it, send 
>> an email to [email protected]. 
>> > For more options, visit https://groups.google.com/d/optout. 
>> > 
>> >
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: tail repair issue (1.4.20)

Reply via email to