Hi,
We had sporadic memory corruption due tail repair in pre .20 version. So we
updated some our servers to .20. This Monday we observed several crushes in
.15 version and tons of "allocation failure" in .20 version. This is
expected as .20 just disables "tail repair" but it seems the problem is
still there. What is interesting:
1) there is no visible change in traffic and only one slab is affected
usually.
2) this always happens with several but not all servers :)
Is there any way to catch this and help with debug? I have all slab and
item stats for the time around incident for .15 and .20 version. .15 is
clearly memory corruption: gdb shows that hash function returned 0 (line
115 uint32_t hv = hash(ITEM_key(search), search->nkey, 0);).
so we seems hitting this comment:
/* Old rare bug could cause a refcount leak. We haven't seen
* it in years, but we leave this code in to prevent failures
* just in case */
:)
Thank you,
Denis
--
---
You received this message because you are subscribed to the Google Groups
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.