Hi there,

background: we are using some clusters and running memcached on
about 30..40 servers with about lots of ram dedicated to the memcache.

short:
1) I know there is no such thing as an expire in memcached.
2) I know memcached is not a database and never will be.

but it would be good if there were an expire, because that would
make memcaches behaviour more predictable. let me explain.

As the cache grows, memcached allways allocates new 1 MB pages to its
slabs.
these pages are dedicated to that slab, forever until power fails or
memcached is
restarted (at least if not ENABLE_SLABS_REASSIGN is used).
with time, the memory print of memcached should be a good footprint of
the needed sizes
(the more objects come with size X the more RAM will be used for the
corresponding slab).

Now here's the problem:
if the environment changes and the distribution of numbers/sizes of
objects change,
then the memcache will not be able to handle all store-requests
correctly. As there
are slabs with low numbers of pages, and there might be no additional
memory left
in the configuration for allocating a new 1 MB page to a slab.

but reassigning slabs is not really usable when talking aboud some GB
of RAM and the
possible need of reassigning some 100+ or 1000+ pages, as you only can
reassing
one page after another, and only when it's full...

I'm not speaking of small memcaches, I'm talking abount e.g. 2-4G RAM
for one memcache-instance.
let me give an example: 3 G RAM for memcache.
memcached-tool gives this output, after e.g. some 20 days uptime.
  #   Item_Size   Max_age   1MB_pages Count   Full?  evicted  outofmem
[...]
 13      1.7 kB    721372 s       1      12      no       0       0
 14      2.1 kB    822425 s       6    2367      no       0       0
 15      2.6 kB    746706 s      19    7364     yes       0       0
 16      3.3 kB    750372 s      29    8985     yes       0       0
 17      4.1 kB    832983 s     313   77622     yes       0       0
 18      5.2 kB    635906 s    1941  384318     yes       0       0
 19      6.4 kB    656661 s     663  104753     yes       0       0
 20      8.1 kB    607873 s      70    8887     yes       0       0
 21     10.1 kB    654811 s       9     907     yes       0       0
 22     12.6 kB    833110 s       5     377      no       0       0
 23     15.8 kB    798629 s       4     254     yes       0       0
 24     19.7 kB    684625 s       7     356     yes       0       0
 25     24.6 kB    732865 s       1      23      no       0       0
 26     30.8 kB    821786 s       1       5      no       0       0
 27     38.5 kB    602869 s       1       9      no       0       0
 32    117.5 kB      8119 s       1       1      no       0       0

You see that nearly all RAM is thrown into slabs 17,18 and 19, but
there are
lots of slabs with only 1 page (=1M) or some slabs without any page.

if now would come some more items for those rarely used slabs, they
cannot be stored (or stored long enough), they get evicted.

using of "-M" for returning errors ist not an ideal solution either.
changing the chunk sizes also would only move the problem to
another slab/slab border. also we have different cache statistics
on different clusters, so one size fits all does not work :)

To circumvent the evitions, I wrote a memcache-expire-script.
To achieve this, I use a client-side-program, doing a "stats cachedump
<slab> 20000",
and then comparing each timestamp and sending an explicit "delete
<key>" to the cache.

As I only can fetch the first 20.000 items (due to limit of 2MB for
the
statsdump return buffer), I mostly get the active items within, so I
can only
delete "some" old items.

the script uses usleeps and runs about 40 seconds +/- per slab
containing more
than 20000 items and deleting e.g. 12000 items.

after some runs, the cache gets really cleared, so now I know how much
ram it really
would need - only about some percent of all.

Same cache, still up, but expired:
  #   Item_Size   Max_age   1MB_pages Count   Full?  evicted  outofmem
[...]
 13      1.7 kB    296634 s       1       7      no       0       0
 14      2.1 kB    603126 s       6    2041      no       0       0
 15      2.6 kB    603129 s      19    4483     yes       0       0
 16      3.3 kB    603127 s      29    6195     yes       0       0
 17      4.1 kB    603129 s     313    7191     yes       0       0
 18      5.2 kB    602231 s    1941    9129     yes       0       0
 19      6.4 kB    603124 s     663    2550     yes       0       0
 20      8.1 kB    515769 s      70     586     yes       5       0
 21     10.1 kB    304993 s       9     184     yes       2       0
 22     12.6 kB    515764 s       5      59     yes       0       0
 23     15.8 kB     66440 s       4      29     yes       0       0
 24     19.7 kB    327439 s       7      57     yes       0       0
 25     24.6 kB     63584 s       1       6      no       0       0
 26     30.8 kB      1564 s       1       2      no       0       0
 27     38.5 kB    303377 s       1       6      no       0       0
 32    117.5 kB     37077 s       1       1      no       0       0

You see the same memory footprint, but you can imagine the waste
of memory: e.g. slot 19: less than 10.000 objects x 5.2kB, so some 50
MB
lost within nearly 2 GB of RAM.

at the same time it shows some evicted items on slot 20/21.
(ok, we have also other memcaches, where evicted is in numbers of 2%
(evicted/total) and this would be quite too much, for e.g. use
memcache
as a store for sessions.
(again: I know memcache is no database and never will be).

by now I started some tests to send my expire regularly (every 30 min)
to a fresh started cache.
It does not even grow large - it's allowed to handle 3 G RAM and stays
ways
below 300MB. that way it has lots of spare pages to allocate for
rarely used slots.

But it's quite a waste of cpu cycles to expire such a tremendous cache
client-side.
this should be done server-side. It's just wrong to let the cpus
extract the keys, prepare
the cachedump, send it over the net, parse the cachedump client-side,
compare the
timestamps (what could be done serverside instead of preparing the
cachedump), then
send thousands of delete requests back to the server and waste
bandwidth a second time.

If expiration is not triggered automatically (seems to me like part of
the philosophy),
then it should be at least possible to trigger it client side.
I imagine another memcache command - in ascii protocol that would be
like this:

expire <slab> <limit>

where <slab> would be the slab number, and limit would be an optional
limit for
deletion, e.g. expire maximum 1000 entries. (ok, it could be very fast
in the memcache
to delete some 100.000 entries or even millions, as it is only RAM,
but I feel saver
like that.

the answer could be like that:

EXPIRED SLAB #19
1843 ITEMS DELETED
2242 ITEMS ACTIVE
END

So, now the question to the maintainer folks:
a) any comments on this?
b) does anyone besides me need this?
c) I could imagine to try and implement on my own, (my knowledge in C
is a bit
    dusty but at least I could read the code till now).
    but this clearly only makes sense if that patch would make its way
into
    the sources and stay there.
d) I don't know, but if implemented, it could be implemented into the
binary protocol as well,
    but this would go beyond my scope.

best regards

Werner Maier

Reply via email to