Hi Dormando,

just in case: some of next paragraphe might sound offending,
but they are not meant to offend anybody.
It's could be just the writing in english which is not
my native tounge.

> You really do want automatic slab reassignment...

yes this would solve all, but only if it cleared the pages before
reassinging, so reassinging would not evict a singe item.

> What you're doing is janky

I know.

> and doesn't really solve the problem.

I know. It's just delaying it. but dayling long enough
if there is enough RAM.

> If anything happens that ends up filling more of your cache,
> you have to restart memcached anyway.

I know. That cannot be denied.

But you must admit that it is a simple workaround and helps
a good way to delay the problem. Sorry, I don't have long-term
expirience right now, my expire was written as a proof-of-concept
on friday and is just to be implemented in our running environmend
this week.

I have a cluster running 2 x 1 GB memcache each, and on these
memcaches my expire is running every 30 minutes. Before the
expire, the 1 GB would be filled in around 1 Day.

I restarted memcache on Friday evening, and since then, the
typical usage of memcache is now avbout 4-8% (depending on
whether the expire did run just now just before the next expire
run. Memcached tells, it is around 50..80 MB (instead of 1 GB).

when I use memcached-tool to display the pages and so on, I see
a total of about 380 MB allocated. (350 MB had been allocated
alread on sunday, since then it is not really growing any more).
and I still have 600 MB spare on each of these two servers.
you can find the memcached-tool display below (one of the example
outputs)

> I'm not entirely sure why you're going through all of this
> trouble, instead of just restarting them occasionally when
> your hit rate starts to suffer due to changes in cache size?

+ Because we store sessions in it.
+ Because our customers are local newspapers.
+ Because their customers are endusers and complain to our
  customers when they have to login again (lost session).
+ Because then our customers complain to us when one
  (in numbers 1) of their customers complain.
+ because thoes sites have even hits in the middle of the night,
  so I have a problem to find a good restarting time.

And finally, in your words:
+ because restarting a service is more janky than doing an expire.
  This is *nix land, in quiet nights you might here other machines
reboot.
  :)

> You're overcomplicating the problem.

no, I'm not overcomplicating.
You are. I just want to have a simple expire, as this primary solves
my problem, as log as I have enough RAM to spare. It is not a
problem for me to have 4 GB of RAM associated to a memcached
that is only using 10% of it, but I want it to be reliable.
When I store an object for 300 seconds, I want it to HOLD that
object for 300 seconds. and not to evict it because any other
process does store another object. And I don't want it not to store
because it has no MEM left to associate because all is assigned to
other slabes what would not have to be done if the cache would have
been expired before.
Of course this is not solving all problems.

Of course, a background thread doing the expire and maybe the
shift of items to other pages for freeing up pages to make it
possible to reassign them without eviction would be THE PERFECT
SOLUTION. But we are at the 10/90-method again.

You get 90% of the result with 10% of work. My suggestion is
to have those 90% and do just the 10% of work. Escpecially
as it only would be needed by a very little number of sites.

Your suggestion is the really good correct perfect version
with 100% of result. That is not a quick fix but a complex change.

And for that is of course the question whether this would be
neccessary, as you correclty state that most sites except some
very high volume sites won't need it.

Sorry, our clusters have up to 20 Million page impressions
per month, and we have that problem.

> 1.4.0's stats will let you calculate hitrate per slab
> (on top of being able to monitor evictions per slab, etc).

the evictions per slab can be monitored right now with memcache-tool,
i filed 10 weeks ago the patch for memcached-tool in 1.2.x here:
http://code.google.com/p/memcached/issues/detail?id=46

  #   Item_Size   Max_age   1MB_pages Count   Full?  evicted  outofmem
[... displaying only the interesting slabs w/ >1000 evictions ...]
 13      1.7 kB    177890 s      95   57855     yes   10017       0
 14      2.1 kB   1361999 s      67   32626     yes   73509       0
 15      2.6 kB    676795 s     230   89240     yes  132292       0
 16      3.3 kB    440497 s     235   72849     yes  122294       0
 17      4.1 kB   2244198 s    1465  363319     yes   34228       0
 18      5.2 kB     55159 s     718  142164     yes  379333       0
 19      6.4 kB     92563 s     392   61932     yes   72786       0
 20      8.1 kB     88771 s      81   10287     yes   21829       0
 21     10.1 kB    117322 s     502   50702     yes    2924       0
[...]
Total size (all slabs):  4109 MB
Total size (slabs w/ > 1000 evictions): 3785 MB
Memory Usage 89%

Same chache if I let run my expire script:
Memory Usage goes down to about 8..10%.

> Instead of mass fetching, mass deleting, and doing other generally
> unscalable things... Do you have proof that your average cache item
> size changes enough to make this worth it?

Well, I guess.
1) I don't know where should the evictions come else.
   Note: the Cache is QUITE EMPTY if I do expire.
2) See this memchached-stats below, cache freely restarted,
   expired every 30 minutes. memcache itself says: abt 4% full.
   now running for 5 days.
   our normal peak is around slabs #15..#20, but
   this cache shows reasonable sizes in slabs #30+,
  #   Item_Size   Max_age   1MB_pages Count   Full?  evicted  outofmem
  2      136 B        481 s       1     124      no       0       0
  3      176 B       1395 s       1     164      no       0       0
  4      224 B       1399 s       1     380      no       0       0
  5      280 B      73594 s       1      13      no       0       0
  6      352 B       5610 s       1     380      no       0       0
  7      440 B       1285 s       1     293      no       0       0
  8      552 B        854 s       1      18      no       0       0
  9      696 B       1389 s       1     145      no       0       0
 10      872 B       1399 s       2     439      no       0       0
 11      1.1 kB      1399 s       9    1793      no       0       0
 12      1.3 kB      1399 s       3     571      no       0       0
 13      1.7 kB      1396 s       1     186      no       0       0
 14      2.1 kB      1376 s       3     370      no       0       0
 15      2.6 kB      1396 s       1      90      no       0       0
 16      3.3 kB      1399 s       9     580      no       0       0
 17      4.1 kB      1396 s      19    1099      no       0       0
 18      5.2 kB      4693 s      19     991     yes       0       0
 19      6.4 kB      4688 s       6     680      no       0       0
 20      8.1 kB      4689 s       3     177      no       0       0
 21     10.1 kB      4380 s       1      25      no       0       0
 22     12.6 kB      1285 s       1      21      no       0       0
 23     15.8 kB      1077 s       1      20      no       0       0
 24     19.7 kB      4457 s       3      38      no       0       0
 25     24.6 kB      2176 s       3      21      no       0       0
 26     30.8 kB      1066 s       4      27      no       0       0
 27     38.5 kB      1375 s       5      46      no       0       0
 28     48.1 kB      1395 s      10      51      no       0       0
 29     60.2 kB      1333 s      12       4      no       0       0
 30     75.2 kB      1014 s      13       7      no       0       0
 31     94.0 kB      1393 s      27      69      no       0       0
 32    117.5 kB      1181 s      24       7     yes       0       0
 33    146.9 kB      1279 s      41       3     yes       0       0
 34    183.6 kB      1321 s      28       2      no       0       0
 35    229.5 kB         0 s       3       0      no       0       0
 36    286.9 kB      1350 s      27       7      no       0       0
 38    448.2 kB         0 s      84       0     yes       0       0
                  Total size:   370 MB

> We *do* need slab reassignment. This doesn't exist presently and isn't a
> simple or free change, but almost everyone except some very massive sites
> can get away without needing it.

as I told before: there is a choice in building houses.
You can build a perfecte house out of stone with marmor pillars,
but if it rains just now, I'd prefere a wooden shack if it is dry.
just talking of the 90/10-Method of before.

My solution with having an slab-expire in the server that normally
does nothing and must be triggered externally would IMHO be a good
solution as it
+ would not affect anybody else (not running = no cost of cpu)
+ would help those who need it (they can buy it with cpu cylcles)
- would need an external script to trigger the expire (but only
  from those who need it). I for my share have that script already,
  and its no work to rewrite it just to send "expire slab limit"
  instead of "stats cachedump"...
- would not be a perfect, golden, shimmering solution.
  But hey, wo's perfect?

regards,

Werner.

Reply via email to