Hi Dormando, just in case: some of next paragraphe might sound offending, but they are not meant to offend anybody. It's could be just the writing in english which is not my native tounge.
> You really do want automatic slab reassignment... yes this would solve all, but only if it cleared the pages before reassinging, so reassinging would not evict a singe item. > What you're doing is janky I know. > and doesn't really solve the problem. I know. It's just delaying it. but dayling long enough if there is enough RAM. > If anything happens that ends up filling more of your cache, > you have to restart memcached anyway. I know. That cannot be denied. But you must admit that it is a simple workaround and helps a good way to delay the problem. Sorry, I don't have long-term expirience right now, my expire was written as a proof-of-concept on friday and is just to be implemented in our running environmend this week. I have a cluster running 2 x 1 GB memcache each, and on these memcaches my expire is running every 30 minutes. Before the expire, the 1 GB would be filled in around 1 Day. I restarted memcache on Friday evening, and since then, the typical usage of memcache is now avbout 4-8% (depending on whether the expire did run just now just before the next expire run. Memcached tells, it is around 50..80 MB (instead of 1 GB). when I use memcached-tool to display the pages and so on, I see a total of about 380 MB allocated. (350 MB had been allocated alread on sunday, since then it is not really growing any more). and I still have 600 MB spare on each of these two servers. you can find the memcached-tool display below (one of the example outputs) > I'm not entirely sure why you're going through all of this > trouble, instead of just restarting them occasionally when > your hit rate starts to suffer due to changes in cache size? + Because we store sessions in it. + Because our customers are local newspapers. + Because their customers are endusers and complain to our customers when they have to login again (lost session). + Because then our customers complain to us when one (in numbers 1) of their customers complain. + because thoes sites have even hits in the middle of the night, so I have a problem to find a good restarting time. And finally, in your words: + because restarting a service is more janky than doing an expire. This is *nix land, in quiet nights you might here other machines reboot. :) > You're overcomplicating the problem. no, I'm not overcomplicating. You are. I just want to have a simple expire, as this primary solves my problem, as log as I have enough RAM to spare. It is not a problem for me to have 4 GB of RAM associated to a memcached that is only using 10% of it, but I want it to be reliable. When I store an object for 300 seconds, I want it to HOLD that object for 300 seconds. and not to evict it because any other process does store another object. And I don't want it not to store because it has no MEM left to associate because all is assigned to other slabes what would not have to be done if the cache would have been expired before. Of course this is not solving all problems. Of course, a background thread doing the expire and maybe the shift of items to other pages for freeing up pages to make it possible to reassign them without eviction would be THE PERFECT SOLUTION. But we are at the 10/90-method again. You get 90% of the result with 10% of work. My suggestion is to have those 90% and do just the 10% of work. Escpecially as it only would be needed by a very little number of sites. Your suggestion is the really good correct perfect version with 100% of result. That is not a quick fix but a complex change. And for that is of course the question whether this would be neccessary, as you correclty state that most sites except some very high volume sites won't need it. Sorry, our clusters have up to 20 Million page impressions per month, and we have that problem. > 1.4.0's stats will let you calculate hitrate per slab > (on top of being able to monitor evictions per slab, etc). the evictions per slab can be monitored right now with memcache-tool, i filed 10 weeks ago the patch for memcached-tool in 1.2.x here: http://code.google.com/p/memcached/issues/detail?id=46 # Item_Size Max_age 1MB_pages Count Full? evicted outofmem [... displaying only the interesting slabs w/ >1000 evictions ...] 13 1.7 kB 177890 s 95 57855 yes 10017 0 14 2.1 kB 1361999 s 67 32626 yes 73509 0 15 2.6 kB 676795 s 230 89240 yes 132292 0 16 3.3 kB 440497 s 235 72849 yes 122294 0 17 4.1 kB 2244198 s 1465 363319 yes 34228 0 18 5.2 kB 55159 s 718 142164 yes 379333 0 19 6.4 kB 92563 s 392 61932 yes 72786 0 20 8.1 kB 88771 s 81 10287 yes 21829 0 21 10.1 kB 117322 s 502 50702 yes 2924 0 [...] Total size (all slabs): 4109 MB Total size (slabs w/ > 1000 evictions): 3785 MB Memory Usage 89% Same chache if I let run my expire script: Memory Usage goes down to about 8..10%. > Instead of mass fetching, mass deleting, and doing other generally > unscalable things... Do you have proof that your average cache item > size changes enough to make this worth it? Well, I guess. 1) I don't know where should the evictions come else. Note: the Cache is QUITE EMPTY if I do expire. 2) See this memchached-stats below, cache freely restarted, expired every 30 minutes. memcache itself says: abt 4% full. now running for 5 days. our normal peak is around slabs #15..#20, but this cache shows reasonable sizes in slabs #30+, # Item_Size Max_age 1MB_pages Count Full? evicted outofmem 2 136 B 481 s 1 124 no 0 0 3 176 B 1395 s 1 164 no 0 0 4 224 B 1399 s 1 380 no 0 0 5 280 B 73594 s 1 13 no 0 0 6 352 B 5610 s 1 380 no 0 0 7 440 B 1285 s 1 293 no 0 0 8 552 B 854 s 1 18 no 0 0 9 696 B 1389 s 1 145 no 0 0 10 872 B 1399 s 2 439 no 0 0 11 1.1 kB 1399 s 9 1793 no 0 0 12 1.3 kB 1399 s 3 571 no 0 0 13 1.7 kB 1396 s 1 186 no 0 0 14 2.1 kB 1376 s 3 370 no 0 0 15 2.6 kB 1396 s 1 90 no 0 0 16 3.3 kB 1399 s 9 580 no 0 0 17 4.1 kB 1396 s 19 1099 no 0 0 18 5.2 kB 4693 s 19 991 yes 0 0 19 6.4 kB 4688 s 6 680 no 0 0 20 8.1 kB 4689 s 3 177 no 0 0 21 10.1 kB 4380 s 1 25 no 0 0 22 12.6 kB 1285 s 1 21 no 0 0 23 15.8 kB 1077 s 1 20 no 0 0 24 19.7 kB 4457 s 3 38 no 0 0 25 24.6 kB 2176 s 3 21 no 0 0 26 30.8 kB 1066 s 4 27 no 0 0 27 38.5 kB 1375 s 5 46 no 0 0 28 48.1 kB 1395 s 10 51 no 0 0 29 60.2 kB 1333 s 12 4 no 0 0 30 75.2 kB 1014 s 13 7 no 0 0 31 94.0 kB 1393 s 27 69 no 0 0 32 117.5 kB 1181 s 24 7 yes 0 0 33 146.9 kB 1279 s 41 3 yes 0 0 34 183.6 kB 1321 s 28 2 no 0 0 35 229.5 kB 0 s 3 0 no 0 0 36 286.9 kB 1350 s 27 7 no 0 0 38 448.2 kB 0 s 84 0 yes 0 0 Total size: 370 MB > We *do* need slab reassignment. This doesn't exist presently and isn't a > simple or free change, but almost everyone except some very massive sites > can get away without needing it. as I told before: there is a choice in building houses. You can build a perfecte house out of stone with marmor pillars, but if it rains just now, I'd prefere a wooden shack if it is dry. just talking of the 90/10-Method of before. My solution with having an slab-expire in the server that normally does nothing and must be triggered externally would IMHO be a good solution as it + would not affect anybody else (not running = no cost of cpu) + would help those who need it (they can buy it with cpu cylcles) - would need an external script to trigger the expire (but only from those who need it). I for my share have that script already, and its no work to rewrite it just to send "expire slab limit" instead of "stats cachedump"... - would not be a perfect, golden, shimmering solution. But hey, wo's perfect? regards, Werner.
