We have a 12-node memcache (v 1.2.5) cluster with ~72GB of memory (6GB
per server, ~1300 request/sec per server).

We've started getting "SERVER_ERROR out of memory" errors during both object 
stores and counter increments. The errors are isolated to 3 of the 12 servers, 
and to the  same slab class (class 1) on each server.

It seems like an out-of-memory error occurs when there are no free chunks in 
the class,
no additional slabs can be allocated, and if no items can be evicted from the 
LRU
(due to non-zero refcounts).

The cluster stores items with a wide range of sizes. It is certainly possible 
that the
item sizes that were prevalent while the cache was filling are different then 
the
item sizes on an ongoing basis (leading to an imperfect slab-to-class 
allocation).

We're using the default "powers-of-N" value (1.25).

The number of errors, relative to the number of successes, is quite small, but 
previously
there were no out-of-memory errors at all.

Are these types of errors typical for a busy, mid-sized cluster with a wide 
item-size
distribution? (Or is this a harbinger of things to come ...)

thanks,

Miguel

Reply via email to