tail repair issue (1.4.20)

Denis Samoylov Tue, 01 Jul 2014 14:25:41 -0700

Hi,

We had sporadic memory corruption due tail repair in pre .20 version. So we 
updated some our servers to .20. This Monday we observed several crushes in 
.15 version and tons of "allocation failure" in .20 version. This is 
expected as .20 just disables "tail repair" but it seems the problem is 
still there. What is interesting:
1) there is no visible change in traffic and only one slab is affected 
usually. 
2) this always happens with several but not all servers :)


Is there any way to catch this and help with debug? I have all slab and 
item stats for the time around incident for .15 and .20 version. .15 is 
clearly memory corruption: gdb shows that hash function returned 0 (line 
115 uint32_t hv = hash(ITEM_key(search), search->nkey, 0);).

so we seems hitting this comment:
            /* Old rare bug could cause a refcount leak. We haven't seen
             * it in years, but we leave this code in to prevent failures
             * just in case */

:)

Thank you,
Denis

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

tail repair issue (1.4.20)

Reply via email to