Re: Idea for reclaimation algo

Slawomir Pryczek Fri, 11 Apr 2014 01:47:31 -0700

Hi Dormando, more about the behaviour... when we're using "normal" 
memcached 1.4.13 16GB of memory gets exhausted in ~1h, then we start to 
have almost instant evictions of needed items (again these items aren't 
really "needed" individually, just when many of them gets evicted it's 
unacceptable because of how badly it affects the system)


~2 years ago i created another version based on that 1.4.13, than does 
garbage collection using custom stats handler. That version is able to be 
running on half of the memory for like 2 weeks, with 0 evictions. But we 
gave it full 16G and just restart it each week to be sure memory usage is 
kept in check, and we're not throwing away good data. Actually after 
changing -f1.25 to -f1.041 the slabs are filling with bad items much 
slower, because items are distributed better and this custom eviction 
function is able to catch more expired data. We have like 200GB of data 
evicted this way, daily. Because of volume (~40k req/s peak, much of it are 
writes) and differences in expire time LRU isn't able to reclaim items 
efficiently.

Maybe people don't even realize the problem, but when we done some testing 
and turned off that "custom" eviction we had like 100% memory used with 10% 
of waste reported by memcached admin. But then we run that custom eviction 
algorithm it turned out that 90% of memory is occupied by garbage. Waste 
reported grew to 80% instantly after running unlimited "reclaim expired" on 
all items in the cache. So in "standard" client when people will be using 
different expire times for items (we have it like 1minute minimum, 6h 
max)... they even won't be able to see how much memory they're wasting in 
some specific cases, when they'll have many items that won't be hit after 
expiration, like we have.

When using memcached as a buffer for mysql writes, we know exactly what to 
hit and when. Short TTL expired items, pile up near the head... long TTL 
"live" items pile up near the tail and it's creating a barrier that 
prevents the LRU algo to reclaim almost anything, if im getting how it 
currently works, correctly...

>You made it sound like you had some data which never expired? Is 
this true? 
Yes, i think because of how evictions are made (to be clear we're not 
setting non-expiring items). These short expiring items pile up in the 
front of linked list, something that is supposed to live for eg. 120 or 180 
seconds is lingering in memory forever, untill we restart the cache... and 
new items are killed almost instantly because there are no expired items on 
head.

It's a special case, because after processing memory list, aggregating data 
and putting it in mysql these items are no longer touched. The list for new 
time period will have completely different set of keys. As we use a prefix 
to generate all items in the list.

$time_slice = floor( self::$time / 60) - $time_slices_back;
$prefix = ")ML){$list_id}-{$time_slice}"; 

Again, not saying current implementation is bad... because it's fast and 
doesn't trash CPU cache when expire times are ~equal, that was probably the 
idea... but we have not typical use case, which LRU isn't able to manage...

Now im making ~same changes i made for .13... but for .17 and i want to 
make it working a little better ;)



W dniu piątek, 11 kwietnia 2014 05:12:10 UTC+2 użytkownik Dormando napisał:
>
>
> > Hey Dormando, thanks again for some comments... appreciate the help. 
> > 
> > Maybe i wasn't clear enough. I need only 1 minute persistence, and i can 
> lose data sometimes, just i can't keep loosing data every minute due to 
> > constant evictions caused by LRU. Actually i have just wrote that in my 
> previous post. We're loosing about 1 minute of non-meaningfull data every 
> > week because of restart that we do when memory starts to fill up (even 
> with our patch reclaiming using linked list, we limit reclaiming to keep 
> > speed better)... so the memory fills up after a week, not 30 minutes... 
>
> Can you explain what you're seeing in more detail? Your data only needs to 
> persist for 1 minute, but it's being evicted before 1 minute is up? 
>
> You made it sound like you had some data which never expired? Is this 
> true? 
>
> If your instance is 16GB, takes a week to fill up, but data only needs to 
> persist for a minute but isn't, something else is very broken? Or am I 
> still misunderstanding you? 
>
> > Now im creating better solution, to limit locking as linked list is 
> getting bigger. 
> > 
> > I explained what was worst implications of unwanted evictions (or 
> loosing all data in cache) in my use case: 
> > 1. loosing ~1 minute of non-significant data that's about to be stored 
> in sql 
> > 2. "flat" distribution of load to workers (not taking response times 
> into account because stats reset). 
> > 3. resorting to alternative targeting algorithm (with global, not local 
> statistics). 
> > 
> > I never, ever said im going to write data that have to be persistent 
> permanently. It's actually same idea as delayed write. If power fails you 
> > loose 5s of data, but you can do 100x more writes. So you need the data 
> to be persistent in memory, between writes the data **can't be lost**. 
> > However you can lose it sometimes, that's the tradeoff that some people 
> can make and some not. Obviously I can't keep loosing this data each 
> > minute, because if i loose much it'll become meaningfull. 
> > 
> > Maybe i wasn't clear in that matter. I can loose all data even 20 times 
> a day. Sensitive data is stored using bulk update or transactions, 
> > bypassing that "delayed write" layer. "0 evictions", that's the kind of 
> "persistence" im going for. So items are persistent for some very short 
> > periods of time (1-5 minutes) without being killed. It's just different 
> use case. Running in production since 2 years, based on 1.4.13, tested for 
> > corectness, monitored so we have enough memory and 0 evictions (just 
> reclaims) 
> > 
> > When i came here with same idea ~2 years ago you just said it's very 
> stupid, now you even made me look like a moron :) And i can understand why 
> you 
> > don't want features that are not ~O(1) perfectly, but please don't get 
> so personal about different ideas to do things and use cases, just because 
> > these won't work for you. 
> > 
> > 
> > 
> > 
> > 
> > W dniu czwartek, 10 kwietnia 2014 20:53:12 UTC+2 użytkownik Dormando 
> napisał: 
> >       You really really really really really *must* not put data in 
> memcached 
> >       which you can't lose. 
> > 
> >       Seriously, really don't do it. If you need persistence, try using 
> a redis 
> >       instance for the persistent stuff, and use memcached for your 
> cache stuff. 
> >       I don't see why you feel like you need to write your own thing, 
> there're a 
> >       lot of persistent key/value stores (kyotocabinet/etc?). They have 
> a much 
> >       lower request ceiling and don't handle the LRU/cache pattern as 
> well, but 
> >       that's why you can use both. 
> > 
> >       Again, please please don't do it. You are damaging your company. 
> You are a 
> >       *danger* to your company. 
> > 
> >       On Thu, 10 Apr 2014, Slawomir Pryczek wrote: 
> > 
> >       > Hi Dormando, thanks for suggestions, background thread would be 
> nice... 
> >       > The idea is actually that with 2-3GB i get plenty of evictions 
> of items that need to be fetched later. And with 16GB i still get 
> >       evictions, 
> >       > actually probably i could throw more memory than 16G and it'd 
> only result in more expired items sitting in the middle of slabs, 
> >       forever... Now im 
> >       > going for persistence. Sounds probably crazy, but we're having 
> some data that we can't loose: 
> >       > 1. statistics, we aggregate writes to DB using memcached (+list 
> implementation). If these items get evicted we're loosing rows in db. 
> >       Loosing data 
> >       > sometimes isn't a big problem. Eg. we restart memcached once a 
> week so we're loosing 1 minute of data every week. But if we have 
> >       evictions we're 
> >       > loosing data constantly (which we can't have) 
> >       > 2. we drive load balancer using data in memcached for 
> statistics, again, not nice to loose data often because workers can get 
> >       incorrect amount of 
> >       > traffic. 
> >       > 3. we're doing some adserving optimizations, eg. counting 
> per-domain ad priority, for one domain it takes about 10 seconds to analyze 
> >       all data and 
> >       > create list of ads, so can't be done online... we put result of 
> this in memcached, if we loose too much of this the system will start 
> >       to serve 
> >       > suboptimal ads (because it'll need to switch to more general 
> data or much simpler algorithm that can be done instantly) 
> >       > 
> >       > Probably would be best to rewrite all this using C or golang, 
> and use memcached just for caching, but it'd take too much time which 
> >       we don't have 
> >       > currently... 
> >       > 
> >       > I have seen twitter and nk implementations that seem to do what 
> i need, but they seem old (based on old code), so I prefer to modify 
> >       code of recent 
> >       > "official" memcached, to not be stuck with old code or 
> abandonware. Actually there are many topics about limitations of currrent 
> >       eviction algo and 
> >       > option to enable some background thread to do scraping based on 
> statistics of most filled slabs (with some parameter to specify if it 
> >       should take 
> >       > light or aggressive approach) would be nice... 
> >       > 
> >       > As for the code... is that slab_rebalance_move function in 
> slab.c? It seems a little difficult to gasp without some DOCs of how 
> >       things are 
> >       > working... can you please write a very short description of how 
> this "angry birds" more workd? 
> > 
> >       Look at doc/protocol.txt for explanations of the slab move 
> options. the 
> >       names are greppable back to the source. 
> > 
> >       > I have quick question about this above... linked is item that's 
> placed on linked list, but what other flags means, and why 2 last are 
> >       2 of them 
> >       > temporary? 
> >       > #define ITEM_LINKED 1 
> >       > #define ITEM_CAS 2 
> >       > 
> >       > /* temp */ 
> >       > #define ITEM_SLABBED 4 
> >       > #define ITEM_FETCHED 8 
> >       > 
> >       > This from slab_rebalance_move seems interesting: 
> >       > refcount = refcount_incr(&it->refcount); 
> >       > ... 
> >       > if (refcount == 1) { /* item is unlinked, unused */ 
> >       > ... 
> >       > } else if (refcount == 2) { /* item is linked but not busy */ 
> >       > 
> >       > Is there some docs about refcounts, locks and item states? 
> Basically why item with refcount 2 is not busy? You're increasing refcount 
> >       by 1 on 
> >       > select, then again when reading data? Can refcount ever be 
> higher than 2 (3 in above case), meaning 2 threads can access same item? 
> > 
> >       The comment on the same line is explaining exactly what it means. 
> > 
> >       Unfortunately it's a bit of a crap shoot. I think I wrote a 
> threads 
> >       explanation somewhnere (some release notes, or in a file in there, 
> I can't 
> >       quite remember offhand). Since scaling the thread code it got a 
> lot more 
> >       complicated. You have to be extremely careful under what 
> circumstances you 
> >       access items (you must hold an item lock + the refcount must be 2 
> if you 
> >       want to unlink it). 
> > 
> >       You'll just have to study it a bit, sorry. Grep around to see 
> where the 
> >       flags are used. 
> > 
> >       > Thanks. 
> >       > 
> >       > W dniu czwartek, 10 kwietnia 2014 06:05:30 UTC+2 użytkownik 
> Dormando napisał: 
> >       >       > Hi Guys, 
> >       >       > im running a specific case where i don't want (actually 
> can't have) to have evicted items (evictions = 0 ideally)... now i 
> >       have 
> >       >       created some simple 
> >       >       > algo that lock the cache, goes through linked list and 
> evicts items... it makes some problems, like 10-20ms cache locks on 
> >       some 
> >       >       cases. 
> >       >       > 
> >       >       > Now im thinking about going through each slab memory 
> (slabs keep a list of allocated memory regions) ... looking for items, 
> >       if 
> >       >       expired item is 
> >       >       > found, evict it... this way i can go eg. 10k items or 
> 1MB of memory at a time + pick slabs with high utilization and run this 
> >       >       "additional" eviction 
> >       >       > only on them... so it'll prevent allocating memory just 
> because unneded data with short TTL is occupying HEAD of the list. 
> >       >       > 
> >       >       > With this linked list eviction im able to run on 2-3GB 
> of memory... without it 16GB of memory is exhausted in 1-2h and then 
> >       memcached 
> >       >       starts to 
> >       >       > kill "good" items (leaving expired ones wasting 
> memory)... 
> >       >       > 
> >       >       > Any comments? 
> >       >       > Thanks. 
> >       > 
> >       >       you're going a bit against the base algorithm. if stuff is 
> falling out of 
> >       >       16GB of memory without ever being utilized again, why is 
> that critical? 
> >       >       Sounds like you're optimizing the numbers instead of 
> actually tuning 
> >       >       anything useful. 
> >       > 
> >       >       That said, you can probably just extend the slab rebalance 
> code. There's a 
> >       >       hook in there (which I called "Angry birds mode") that 
> drives a slab 
> >       >       rebalance when it'd otherwise run an eviction. That code 
> already safely 
> >       >       walks the slab page for unlocked memory and frees it; you 
> could edit it 
> >       >       slightly to check for expiration and then freelist it into 
> the slab class 
> >       >       instead. 
> >       > 
> >       >       Since it's already a background thread you could further 
> modify it to just 
> >       >       wake up and walk pages for stuff to evict. 
> >       > 
> >       > -- 
> >       > 
> >       > --- 
> >       > You received this message because you are subscribed to the 
> Google Groups "memcached" group. 
> >       > To unsubscribe from this group and stop receiving emails from 
> it, send an email to [email protected]. 
> >       > For more options, visit https://groups.google.com/d/optout. 
> >       > 
> >       > 
> > 
> > -- 
> > 
> > --- 
> > You received this message because you are subscribed to the Google 
> Groups "memcached" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to [email protected] <javascript:>. 
> > For more options, visit https://groups.google.com/d/optout. 
> > 
> >

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: Idea for reclaimation algo

Reply via email to