Re: Idea for reclaimation algo

Slawomir Pryczek Thu, 10 Apr 2014 19:25:22 -0700

Hey Dormando, thanks again for some comments... appreciate the help.

Maybe i wasn't clear enough. I need only 1 minute persistence, and i can 
lose data sometimes, just i can't keep loosing data every minute due to 
constant evictions caused by LRU. Actually i have just wrote that in my 
previous post. We're loosing about 1 minute of non-meaningfull data every 
week because of restart that we do when memory starts to fill up (even with 
our patch reclaiming using linked list, we limit reclaiming to keep speed 
better)... so the memory fills up after a week, not 30 minutes...


Now im creating better solution, to limit locking as linked list is getting 
bigger.

I explained what was worst implications of unwanted evictions (or loosing 
all data in cache) in my use case:
1. loosing ~1 minute of non-significant data that's about to be stored in 
sql
2. "flat" distribution of load to workers (not taking response times into 
account because stats reset).
3. resorting to alternative targeting algorithm (with global, not local 
statistics).

I never, ever said im going to write data that have to be persistent 
permanently. It's actually same idea as delayed write. If power fails you 
loose 5s of data, but you can do 100x more writes. So you need the data to 
be persistent in memory, between writes the data **can't be lost**. However 
you can lose it sometimes, that's the tradeoff that some people can make 
and some not. Obviously I can't keep loosing this data each minute, because 
if i loose much it'll become meaningfull.

Maybe i wasn't clear in that matter. I can loose all data even 20 times a 
day. Sensitive data is stored using bulk update or transactions, bypassing 
that "delayed write" layer. "0 evictions", that's the kind of "persistence" 
im going for. So items are persistent for some very short periods of time 
(1-5 minutes) without being killed. It's just different use case. Running 
in production since 2 years, based on 1.4.13, tested for corectness, 
monitored so we have enough memory and 0 evictions (just reclaims)

When i came here with same idea ~2 years ago you just said it's very 
stupid, now you even made me look like a moron :) And i can understand why 
you don't want features that are not ~O(1) perfectly, but please don't get 
so personal about different ideas to do things and use cases, just because 
these won't work for you.





W dniu czwartek, 10 kwietnia 2014 20:53:12 UTC+2 użytkownik Dormando 
napisał:
>
> You really really really really really *must* not put data in memcached 
> which you can't lose. 
>
> Seriously, really don't do it. If you need persistence, try using a redis 
> instance for the persistent stuff, and use memcached for your cache stuff. 
> I don't see why you feel like you need to write your own thing, there're a 
> lot of persistent key/value stores (kyotocabinet/etc?). They have a much 
> lower request ceiling and don't handle the LRU/cache pattern as well, but 
> that's why you can use both. 
>
> Again, please please don't do it. You are damaging your company. You are a 
> *danger* to your company. 
>
> On Thu, 10 Apr 2014, Slawomir Pryczek wrote: 
>
> > Hi Dormando, thanks for suggestions, background thread would be nice... 
> > The idea is actually that with 2-3GB i get plenty of evictions of items 
> that need to be fetched later. And with 16GB i still get evictions, 
> > actually probably i could throw more memory than 16G and it'd only 
> result in more expired items sitting in the middle of slabs, forever... Now 
> im 
> > going for persistence. Sounds probably crazy, but we're having some data 
> that we can't loose: 
> > 1. statistics, we aggregate writes to DB using memcached (+list 
> implementation). If these items get evicted we're loosing rows in db. 
> Loosing data 
> > sometimes isn't a big problem. Eg. we restart memcached once a week so 
> we're loosing 1 minute of data every week. But if we have evictions we're 
> > loosing data constantly (which we can't have) 
> > 2. we drive load balancer using data in memcached for statistics, again, 
> not nice to loose data often because workers can get incorrect amount of 
> > traffic. 
> > 3. we're doing some adserving optimizations, eg. counting per-domain ad 
> priority, for one domain it takes about 10 seconds to analyze all data and 
> > create list of ads, so can't be done online... we put result of this in 
> memcached, if we loose too much of this the system will start to serve 
> > suboptimal ads (because it'll need to switch to more general data or 
> much simpler algorithm that can be done instantly) 
> > 
> > Probably would be best to rewrite all this using C or golang, and use 
> memcached just for caching, but it'd take too much time which we don't have 
> > currently... 
> > 
> > I have seen twitter and nk implementations that seem to do what i need, 
> but they seem old (based on old code), so I prefer to modify code of recent 
> > "official" memcached, to not be stuck with old code or abandonware. 
> Actually there are many topics about limitations of currrent eviction algo 
> and 
> > option to enable some background thread to do scraping based on 
> statistics of most filled slabs (with some parameter to specify if it 
> should take 
> > light or aggressive approach) would be nice... 
> > 
> > As for the code... is that slab_rebalance_move function in slab.c? It 
> seems a little difficult to gasp without some DOCs of how things are 
> > working... can you please write a very short description of how this 
> "angry birds" more workd? 
>
> Look at doc/protocol.txt for explanations of the slab move options. the 
> names are greppable back to the source. 
>
> > I have quick question about this above... linked is item that's placed 
> on linked list, but what other flags means, and why 2 last are 2 of them 
> > temporary? 
> > #define ITEM_LINKED 1 
> > #define ITEM_CAS 2 
> > 
> > /* temp */ 
> > #define ITEM_SLABBED 4 
> > #define ITEM_FETCHED 8 
> > 
> > This from slab_rebalance_move seems interesting: 
> > refcount = refcount_incr(&it->refcount); 
> > ... 
> > if (refcount == 1) { /* item is unlinked, unused */ 
> > ... 
> > } else if (refcount == 2) { /* item is linked but not busy */ 
> > 
> > Is there some docs about refcounts, locks and item states? Basically why 
> item with refcount 2 is not busy? You're increasing refcount by 1 on 
> > select, then again when reading data? Can refcount ever be higher than 2 
> (3 in above case), meaning 2 threads can access same item? 
>
> The comment on the same line is explaining exactly what it means. 
>
> Unfortunately it's a bit of a crap shoot. I think I wrote a threads 
> explanation somewhnere (some release notes, or in a file in there, I can't 
> quite remember offhand). Since scaling the thread code it got a lot more 
> complicated. You have to be extremely careful under what circumstances you 
> access items (you must hold an item lock + the refcount must be 2 if you 
> want to unlink it). 
>
> You'll just have to study it a bit, sorry. Grep around to see where the 
> flags are used. 
>
> > Thanks. 
> > 
> > W dniu czwartek, 10 kwietnia 2014 06:05:30 UTC+2 użytkownik Dormando 
> napisał: 
> >       > Hi Guys, 
> >       > im running a specific case where i don't want (actually can't 
> have) to have evicted items (evictions = 0 ideally)... now i have 
> >       created some simple 
> >       > algo that lock the cache, goes through linked list and evicts 
> items... it makes some problems, like 10-20ms cache locks on some 
> >       cases. 
> >       > 
> >       > Now im thinking about going through each slab memory (slabs keep 
> a list of allocated memory regions) ... looking for items, if 
> >       expired item is 
> >       > found, evict it... this way i can go eg. 10k items or 1MB of 
> memory at a time + pick slabs with high utilization and run this 
> >       "additional" eviction 
> >       > only on them... so it'll prevent allocating memory just because 
> unneded data with short TTL is occupying HEAD of the list. 
> >       > 
> >       > With this linked list eviction im able to run on 2-3GB of 
> memory... without it 16GB of memory is exhausted in 1-2h and then memcached 
> >       starts to 
> >       > kill "good" items (leaving expired ones wasting memory)... 
> >       > 
> >       > Any comments? 
> >       > Thanks. 
> > 
> >       you're going a bit against the base algorithm. if stuff is falling 
> out of 
> >       16GB of memory without ever being utilized again, why is that 
> critical? 
> >       Sounds like you're optimizing the numbers instead of actually 
> tuning 
> >       anything useful. 
> > 
> >       That said, you can probably just extend the slab rebalance code. 
> There's a 
> >       hook in there (which I called "Angry birds mode") that drives a 
> slab 
> >       rebalance when it'd otherwise run an eviction. That code already 
> safely 
> >       walks the slab page for unlocked memory and frees it; you could 
> edit it 
> >       slightly to check for expiration and then freelist it into the 
> slab class 
> >       instead. 
> > 
> >       Since it's already a background thread you could further modify it 
> to just 
> >       wake up and walk pages for stuff to evict. 
> > 
> > -- 
> > 
> > --- 
> > You received this message because you are subscribed to the Google 
> Groups "memcached" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to [email protected] <javascript:>. 
> > For more options, visit https://groups.google.com/d/optout. 
> > 
> >

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: Idea for reclaimation algo

Reply via email to