Re: Check for orphaned items in lru crawler thread

Scott Mansfield Thu, 01 Oct 2015 00:52:38 -0700

A few different servers (5 / 205) experienced a segfault all within an hour 
or so. Unfortunately at this point I'm a bit out of my depth. I have the 
dmesg output, which is identical for all 5 boxes:


[46545.316351] memcached[2789]: segfault at 0 ip 000000000040e007 sp 
00007f362ceedeb0 error 4 in memcached[400000+1d000]


I can possibly supply the binary file if needed, though we didn't do 
anything besides the standard setup and compile.



On Tuesday, September 29, 2015 at 10:27:59 PM UTC-7, Dormando wrote:
>
> If you look at the new branch there's a commit explaining the new stats. 
>
> You can watch slab_reassing_evictions vs slab_reassign_saves. you can also 
> test automove=1 vs automove=2 (please also turn on the lru_maintainer and 
> lru_crawler). 
>
> The initial branch you were running didn't add any new stats. It just 
> restored an old feature. 
>
> On Tue, 29 Sep 2015, Scott Mansfield wrote: 
>
> > An unrelated prod problem meant I had to stop after about an hour. I'm 
> turning it on again tomorrow morning. 
> > Are there any new metrics I should be looking at? Anything new in the 
> stats output? I'm about to take a look at the diffs as well. 
> > 
> > On Tuesday, September 29, 2015 at 12:37:45 PM UTC-7, Dormando wrote: 
> >       excellent. if automove=2 is too aggressive you'll see that come in 
> in a 
> >       hit ratio reduction. 
> > 
> >       the new branch works with automove=2 as well, but it will attempt 
> to 
> >       rescue valid items in the old slab if possible. I'll still be 
> working on 
> >       it for another few hours today though. I'll mail again when I'm 
> done. 
> > 
> >       On Tue, 29 Sep 2015, Scott Mansfield wrote: 
> > 
> >       > I have the first commit (slab_automove=2) running in prod right 
> now. Later today will be a full load production test of the latest code. 
> I'll just let it run for a few days unless I spot any problems. We have 
> good metrics for latency et. al. from the client side, though network 
> normally dwarfs memcached time. 
> >       > 
> >       > On Tuesday, September 29, 2015 at 3:10:03 AM UTC-7, Dormando 
> wrote: 
> >       >       That's unfortunate. 
> >       > 
> >       >       I've done some more work on the branch: 
> >       >       https://github.com/memcached/memcached/pull/112 
> >       > 
> >       >       It's not completely likely you would see enough of an 
> improvement from the 
> >       >       new default mode. However if your item sizes change 
> gradually, items are 
> >       >       reclaimed during expiration, or get overwritten (and thus 
> freed in the old 
> >       >       class), it should work just fine. I have another patch 
> coming which should 
> >       >       help though. 
> >       > 
> >       >       Open to feedback from any interested party. 
> >       > 
> >       >       On Fri, 25 Sep 2015, Scott Mansfield wrote: 
> >       > 
> >       >       > I have it running internally, and it runs fine under 
> normal load. It's difficult to put it into the line of fire for a 
> production workload because of social reasons... As well it's a degenerate 
> case that we normally don't run in to (and actively try to avoid). I'm 
> going to run some heavier load tests on it today.  
> >       >       > 
> >       >       > On Wednesday, September 9, 2015 at 10:23:32 AM UTC-7, 
> Scott Mansfield wrote: 
> >       >       >       I'm working on getting a test going internally. 
> I'll let you know how it goes.  
> >       >       > 
> >       >       > 
> >       >       > Scott Mansfield 
> >       >       > On Mon, Sep 7, 2015 at 2:33 PM, dormando wrote: 
> >       >       >       Yo, 
> >       >       > 
> >       >       >       
> https://github.com/dormando/memcached/commits/slab_rebal_next - would you 
> >       >       >       mind playing around with the branch here? You can 
> see the start options in 
> >       >       >       the test. 
> >       >       > 
> >       >       >       This is a dead simple modification (a restoration 
> of a feature that was 
> >       >       >       arleady there...). The test very aggressively 
> writes and is able to shunt 
> >       >       >       memory around appropriately. 
> >       >       > 
> >       >       >       The work I'm exploring right now will allow 
> savings of items being 
> >       >       >       rebalanced from, and increasing the aggression of 
> page moving without 
> >       >       >       being so brain damaged about it. 
> >       >       > 
> >       >       >       But while I'm poking around with that, I'd be 
> interested in knowing if 
> >       >       >       this simple branch is an improvement, and if so 
> how much. 
> >       >       > 
> >       >       >       I'll push more code to the branch, but the changes 
> should be gated behind 
> >       >       >       a feature flag. 
> >       >       > 
> >       >       >       On Tue, 18 Aug 2015, 'Scott Mansfield' via 
> memcached wrote: 
> >       >       > 
> >       >       >       > 
> >       >       >       > No worries man, you're doing us a favor. Let me 
> know if there's anything you need from us, and I promise I'll be quicker 
> this time :) 
> >       >       >       > 
> >       >       >       > On Aug 18, 2015 12:01 AM, "dormando" <
> dorm...@rydia.net> wrote: 
> >       >       >       >       Hey, 
> >       >       >       > 
> >       >       >       >       I'm still really interested in working on 
> this. I'll be taking a careful 
> >       >       >       >       look soon I hope. 
> >       >       >       > 
> >       >       >       >       On Mon, 3 Aug 2015, Scott Mansfield wrote: 
> >       >       >       > 
> >       >       >       >       > I've tweaked the program slightly, so 
> I'm adding a new version. It prints more stats as it goes and runs a bit 
> faster. 
> >       >       >       >       > 
> >       >       >       >       > On Monday, August 3, 2015 at 1:20:37 AM 
> UTC-7, Scott Mansfield wrote: 
> >       >       >       >       >       Total brain fart on my part. 
> Apparently I had memcached 1.4.13 on my path (who knows how...) Using the 
> actual one that I've built works. Sorry for the confusion... can't believe 
> I didn't realize that before. I'm testing against the compiled one now to 
> see how it behaves. 
> >       >       >       >       >       On Monday, August 3, 2015 at 
> 1:15:06 AM UTC-7, Dormando wrote: 
> >       >       >       >       >             You sure that's 1.4.24? None 
> of those fail for me :( 
> >       >       >       >       > 
> >       >       >       >       >             On Mon, 3 Aug 2015, Scott 
> Mansfield wrote: 
> >       >       >       >       > 
> >       >       >       >       >             > The command line I've used 
> that will start is: 
> >       >       >       >       >             > 
> >       >       >       >       >             > memcached -m 64 -o 
> slab_reassign,slab_automove 
> >       >       >       >       >             > 
> >       >       >       >       >             > 
> >       >       >       >       >             > the ones that fail are: 
> >       >       >       >       >             > 
> >       >       >       >       >             > 
> >       >       >       >       >             > memcached -m 64 -o 
> slab_reassign,slab_automove,lru_crawler,lru_maintainer 
> >       >       >       >       >             > 
> >       >       >       >       >             > memcached -o lru_crawler 
> >       >       >       >       >             > 
> >       >       >       >       >             > 
> >       >       >       >       >             > I'm sure I've missed 
> something during compile, though I just used ./configure and make. 
> >       >       >       >       >             > 
> >       >       >       >       >             > 
> >       >       >       >       >             > On Monday, August 3, 2015 
> at 12:22:33 AM UTC-7, Scott Mansfield wrote: 
> >       >       >       >       >             >       I've attached a 
> pretty simple program to connect, fill a slab with data, and then fill 
> another slab slowly with data of a different size. I've been trying to get 
> memcached to run with the lru_crawler and lru_maintainer flags, but I get ' 
> >       >       >       >       >             > 
> >       >       >       >       >             >       Illegal suboption 
> "(null)"' every time I try to start with either in any configuration. 
> >       >       >       >       >             > 
> >       >       >       >       >             > 
> >       >       >       >       >             >       I haven't seen it 
> start to move slabs automatically with a freshly installed 1.2.24. 
> >       >       >       >       >             > 
> >       >       >       >       >             > 
> >       >       >       >       >             >       On Tuesday, July 21, 
> 2015 at 4:55:17 PM UTC-7, Scott Mansfield wrote: 
> >       >       >       >       >             >             I realize I've 
> not given you the tests to reproduce the behavior. I should be able to 
> soon. Sorry about the delay here. 
> >       >       >       >       >             > In the mean time, I wanted 
> to bring up a possible secondary use of the same logic to move items on 
> slab rebalancing. I think the system might benefit from using the same 
> logic to crawl the pages in a slab and compact the data in the background. 
> In the case where we have memory that is assigned to the slab 
> >       but not 
> >       >       >       being used 
> >       >       >       >       because 
> >       >       >       >       >             of replaced 
> >       >       >       >       >             > or TTL'd out data, 
> returning the memory to a pool of free memory will allow a slab to grow 
> with that memory first instead of waiting for an event where memory is 
> needed at that instant. 
> >       >       >       >       >             > 
> >       >       >       >       >             > It's a change in approach, 
> from reactive to proactive. What do you think? 
> >       >       >       >       >             > 
> >       >       >       >       >             > On Monday, July 13, 2015 
> at 5:54:11 PM UTC-7, Dormando wrote: 
> >       >       >       >       >             >       > First, more detail 
> for you: 
> >       >       >       >       >             >       > 
> >       >       >       >       >             >       > We are running 
> 1.4.24 in production and haven't noticed any bugs as of yet. The new LRUs 
> seem to be working well, though we nearly always run memcached scaled to 
> hold all data without evictions. Those with evictions are behaving well. 
> Those without evictions haven't seen crashing or any other 
> >       noticeable 
> >       >       bad 
> >       >       >       behavior. 
> >       >       >       >       >             > 
> >       >       >       >       >             >       Neat. 
> >       >       >       >       >             > 
> >       >       >       >       >             >       > 
> >       >       >       >       >             >       > OK, I think I see 
> an area where I was speculating on functionality. If you have a key in slab 
> 21 and then the same key is written again at a larger size in slab 23 I 
> assumed that the space in 21 was not freed on the second write. With that 
> assumption, the LRU crawler would not free up that space. 
> >       Also just 
> >       >       >       by observation 
> >       >       >       >       in 
> >       >       >       >       >             the 
> >       >       >       >       >             >       macro, the space is 
> not freed 
> >       >       >       >       >             >       > fast enough to be 
> effective, in our use case, to accept the writes that are happening. Think 
> in the hundreds of millions of "overwrites" in a 6 - 10 hour period across 
> a cluster. 
> >       >       >       >       >             > 
> >       >       >       >       >             >       Internally, "items" 
> (a key/value pair) are generally immutable. The only 
> >       >       >       >       >             >       time when it's not 
> is for INCR/DECR, and it still becomes immutable if two 
> >       >       >       >       >             >       INCR/DECR's collide. 
> >       >       >       >       >             > 
> >       >       >       >       >             >       What this means, is 
> that the new item is staged in a piece of free memory 
> >       >       >       >       >             >       while the "upload" 
> stage of the SET happens. When memcached has all of the 
> >       >       >       >       >             >       data in memory to 
> replace the item, it does an internal swap under a lock. 
> >       >       >       >       >             >       The old item is 
> removed from the hash table and LRU, and the new item gets 
> >       >       >       >       >             >       put in its place (at 
> the head of the LRU). 
> >       >       >       >       >             > 
> >       >       >       >       >             >       Since items are 
> refcounted, this means that if other users are downloading 
> >       >       >       >       >             >       an item which just 
> got replaced, their memory doesn't get corrupted by the 
> >       >       >       >       >             >       item changing out 
> from underneath them. They can continue to read the old 
> >       >       >       >       >             >       item until they're 
> done. When the refcount reaches zero the old memory is 
> >       >       >       >       >             >       reclaimed. 
> >       >       >       >       >             > 
> >       >       >       >       >             >       Most of the time, 
> the item replacement happens then the old memory is 
> >       >       >       >       >             >       immediately removed. 
> >       >       >       >       >             > 
> >       >       >       >       >             >       However, this does 
> mean that you need *one* piece of free memory to 
> >       >       >       >       >             >       replace the old one. 
> Then the old memory gets freed after that set. 
> >       >       >       >       >             > 
> >       >       >       >       >             >       So if you take a 
> memcached instance with 0 free chunks, and do a rolling 
> >       >       >       >       >             >       replacement of all 
> items (within the same slab class as before), the first 
> >       >       >       >       >             >       one would cause an 
> eviction from the tail of the LRU to get a free chunk. 
> >       >       >       >       >             >       Every SET after that 
> would use the chunk freed from the replacement of the 
> >       >       >       >       >             >       previous memory. 
> >       >       >       >       >             > 
> >       >       >       >       >             >       > After that last 
> sentence I realized I also may not have explained well enough the access 
> pattern. The keys are all overwritten every day, but it takes some time to 
> write them all (obviously). We see a huge increase in the bytes metric as 
> if the new data for the old keys was being written for the 
> >       first 
> >       >       time. 
> >       >       >       Since the 
> >       >       >       >       "old" 
> >       >       >       >       >             slab for 
> >       >       >       >       >             >       the same key doesn't 
> >       >       >       >       >             >       > proactively 
> release memory, it starts to fill up the cache and then start evicting data 
> in the new slab. Once that happens, we see evictions in the old slab 
> because of the algorithm you mentioned (random picking / freeing of 
> memory). Typically we don't see any use for "upgrading" an item as the new 
> >       data 
> >       >       >       would be entirely 
> >       >       >       >       >             new and 
> >       >       >       >       >             >       should wholesale 
> replace the 
> >       >       >       >       >             >       > old data for that 
> key. More specifically, the operation is always set, with different data 
> each day. 
> >       >       >       >       >             > 
> >       >       >       >       >             >       Right. Most of your 
> problems will come from two areas. One being that 
> >       >       >       >       >             >       writing data 
> aggressively into the new slab class (unless you set the 
> >       >       >       >       >             >       rebalancer to 
> always-replace mode), the mover will make memory available 
> >       >       >       >       >             >       more slowly than you 
> can insert. So you'll cause extra evictions in the 
> >       >       >       >       >             >       new slab class. 
> >       >       >       >       >             > 
> >       >       >       >       >             >       The secondary 
> problem is from the random evictions in the previous slab 
> >       >       >       >       >             >       class as stuff is 
> chucked on the floor to make memory moveable. 
> >       >       >       >       >             > 
> >       >       >       >       >             >       > As for testing, 
> we'll be able to put it under real production workload. I don't know what 
> kind of data you mean you need for testing. The data stored in the caches 
> are highly confidential. I can give you all kinds of metrics, since we 
> collect most of the ones that are in the stats and some from the 
> >       stats 
> >       >       >       slabs output. If 
> >       >       >       >       >             you have 
> >       >       >       >       >             >       some specific ones 
> that 
> >       >       >       >       >             >       > need collecting, 
> I'll double check and make sure we can get those. Alternatively, it might 
> be most beneficial to see the metrics in person :) 
> >       >       >       >       >             > 
> >       >       >       >       >             >       I just need stats 
> snapshots here and there, and actually putting the thing 
> >       >       >       >       >             >       under load. When I 
> did the LRU work I had to beg for several months 
> >       >       >       >       >             >       before anyone tested 
> it with a production load. This slows things down and 
> >       >       >       >       >             >       demotivates me from 
> working on the project. 
> >       >       >       >       >             > 
> >       >       >       >       >             >       Unfortunately my 
> dayjob keeps me pretty busy so ~internet~ would probably 
> >       >       >       >       >             >       be best. 
> >       >       >       >       >             > 
> >       >       >       >       >             >       > I can create a 
> driver program to reproduce the behavior on a smaller scale. It would write 
> e.g. 10k keys of 10k size, then rewrite the same keys with different size 
> data. I'll work on that and post it to this thread when I can reproduce the 
> behavior locally. 
> >       >       >       >       >             > 
> >       >       >       >       >             >       Ok. There're slab 
> rebalance unit tests in the t/ directory which do things 
> >       >       >       >       >             >       like this, and I've 
> used mc-crusher to slam the rebalancer. It's pretty 
> >       >       >       >       >             >       easy to run one 
> config to load up 10k objects, then flip to the other 
> >       >       >       >       >             >       using the same key 
> namespace. 
> >       >       >       >       >             > 
> >       >       >       >       >             >       > Thanks, 
> >       >       >       >       >             >       > Scott 
> >       >       >       >       >             >       > 
> >       >       >       >       >             >       > On Saturday, July 
> 11, 2015 at 12:05:54 PM UTC-7, Dormando wrote: 
> >       >       >       >       >             >       >       Hey, 
> >       >       >       >       >             >       > 
> >       >       >       >       >             >       >       On Fri, 10 
> Jul 2015, Scott Mansfield wrote: 
> >       >       >       >       >             >       > 
> >       >       >       >       >             >       >       > We've seen 
> issues recently where we run a cluster that typically has the majority of 
> items overwritten in the same slab every day and a sudden change in data 
> size evicts a ton of data, affecting downstream systems. To be clear that 
> is our problem, but I think there's a tweak in memcached 
> >       that might 
> >       >       >       be useful and 
> >       >       >       >       >             another 
> >       >       >       >       >             >       possible feature 
> that 
> >       >       >       >       >             >       >       would be 
> even 
> >       >       >       >       >             >       >       > better. 
> >       >       >       >       >             >       >       > The data 
> that is written to this cache is overwritten every day, though the TTL is 7 
> days. One slab takes up the majority of the space in the cache. The 
> application wrote e.g. 10KB (slab 21) every day for each key consistently. 
> One day, a change occurred where it started writing 15KB (slab 
> >       23), 
> >       >       >       causing a migration 
> >       >       >       >       >             of data 
> >       >       >       >       >             >       from one slab to 
> >       >       >       >       >             >       >       another. We 
> had -o 
> >       >       >       >       >             >       >       > 
> slab_reassign,slab_automove=1 set on the server, causing large numbers of 
> evictions on the initial slab. Let's say the cache could hold the data at 
> 15KB per key, but the old data was not technically TTL'd out in it's old 
> slab. This means that memory was not being freed by the lru crawler 
> >       thread (I 
> >       >       >       think) because 
> >       >       >       >       its 
> >       >       >       >       >             expiry 
> >       >       >       >       >             >       had not come 
> >       >       >       >       >             >       >       around.  
> >       >       >       >       >             >       >       > 
> >       >       >       >       >             >       >       > lines 1199 
> and 1200 in items.c: 
> >       >       >       >       >             >       >       > if 
> ((search->exptime != 0 && search->exptime < current_time) || 
> is_flushed(search)) { 
> >       >       >       >       >             >       >       > 
> >       >       >       >       >             >       >       > If there 
> was a check to see if this data was "orphaned," i.e. that the key, if 
> accessed, would map to a different slab than the current one, then these 
> orphans could be reclaimed as free memory. I am working on a patch to do 
> this, though I have reservations about performing a hash on the 
> >       key on the 
> >       >       >       lru crawler 
> >       >       >       >       >             thread (if 
> >       >       >       >       >             >       the hash is not 
> >       >       >       >       >             >       >       already 
> available). 
> >       >       >       >       >             >       >       > I have 
> very little experience in the memcached codebase so I don't know the most 
> efficient way to do this. Any help would be appreciated. 
> >       >       >       >       >             >       > 
> >       >       >       >       >             >       >       There seems 
> to be a misconception about how the slab classes work. A key, 
> >       >       >       >       >             >       >       if already 
> existing in a slab, will always map to the slab class it 
> >       >       >       >       >             >       >       currently 
> fits into. The slab classes always exist, but the amount of 
> >       >       >       >       >             >       >       memory 
> reserved for each of them will shift with the slab_reassign. ie: 10 
> >       >       >       >       >             >       >       pages in 
> slab class 21, then memory pressure on 23 causes it to move over. 
> >       >       >       >       >             >       > 
> >       >       >       >       >             >       >       So if you 
> examine a key that still exists in slab class 21, it has no 
> >       >       >       >       >             >       >       reason to 
> move up or down the slab classes. 
> >       >       >       >       >             >       > 
> >       >       >       >       >             >       >       > 
> Alternatively, and possibly more beneficial is compaction of data in a slab 
> using the same set of criteria as lru crawling. Understandably, compaction 
> is a very difficult problem to solve since moving the data would be a pain 
> in the ass. I saw a couple of discussions about this in the 
> >       mailing list, 
> >       >       >       though I didn't 
> >       >       >       >       >             see any 
> >       >       >       >       >             >       firm thoughts about 
> >       >       >       >       >             >       >       it. I think 
> it 
> >       >       >       >       >             >       >       > can 
> probably be done in O(1) like the lru crawler by limiting the number of 
> items it touches each time. Writing and reading are doable in O(1) so 
> moving should be as well. Has anyone given more thought on compaction? 
> >       >       >       >       >             >       > 
> >       >       >       >       >             >       >       I'd be 
> interested in hacking this up for you folks if you can provide me 
> >       >       >       >       >             >       >       testing and 
> some data to work with. With all of the LRU work I did in 
> >       >       >       >       >             >       >       1.4.24, the 
> next things I wanted to do is a big improvement on the slab 
> >       >       >       >       >             >       >       reassignment 
> code. 
> >       >       >       >       >             >       > 
> >       >       >       >       >             >       >       Currently it 
> picks essentially a random slab page, empties it, and moves 
> >       >       >       >       >             >       >       the slab 
> page into the class under pressure. 
> >       >       >       >       >             >       > 
> >       >       >       >       >             >       >       One thing we 
> can do is first examine for free memory in the existing slab, 
> >       >       >       >       >             >       >       IE: 
> >       >       >       >       >             >       > 
> >       >       >       >       >             >       >       - Take a 
> page from slab 21 
> >       >       >       >       >             >       >       - Scan the 
> page for valid items which need to be moved 
> >       >       >       >       >             >       >       - Pull free 
> memory from slab 21, migrate the item (moderately complicated) 
> >       >       >       >       >             >       >       - When the 
> page is empty, move it (or give up if you run out of free 
> >       >       >       >       >             >       >       chunks). 
> >       >       >       >       >             >       > 
> >       >       >       >       >             >       >       The next 
> step is to pull from the LRU on slab 21: 
> >       >       >       >       >             >       > 
> >       >       >       >       >             >       >       - Take page 
> from slab 21 
> >       >       >       >       >             >       >       - Scan page 
> for valid items 
> >       >       >       >       >             >       >       - Pull free 
> memory from slab 21, migrate the item 
> >       >       >       >       >             >       >         - If no 
> memory free, evict tail of slab 21. use that chunk. 
> >       >       >       >       >             >       >       - When the 
> page is empty, move it. 
> >       >       >       >       >             >       > 
> >       >       >       >       >             >       >       Then, when 
> you hit this condition your least-recently-used data gets 
> >       >       >       >       >             >       >       culled as 
> new data migrates your page class. This should match a natural 
> >       >       >       >       >             >       >       occurrance 
> if you would already be evicting valid (but old) items to make 
> >       >       >       >       >             >       >       room for new 
> items. 
> >       >       >       >       >             >       > 
> >       >       >       >       >             >       >       A bonus to 
> using the free memory trick, is that I can use the amount of 
> >       >       >       >       >             >       >       free space 
> in a slab class as a heuristic to more quickly move slab pages 
> >       >       >       >       >             >       >       around. 
> >       >       >       >       >             >       > 
> >       >       >       >       >             >       >       If it's 
> still necessary from there, we can explore "upgrading" items to a 
> >       >       >       >       >             >       >       new slab 
> class, but that is much much more complicated since the item has 
> >       >       >       >       >             >       >       to shift 
> LRU's. Do you put it at the head, the tail, the middle, etc? It 
> >       >       >       >       >             >       >       might be 
> impossible to make a good generic decision there. 
> >       >       >       >       >             >       > 
> >       >       >       >       >             >       >       What version 
> are you currently on? If 1.4.24, have you seen any 
> >       >       >       >       >             >       >       instability? 
> I'm currently torn between fighting a few bugs and start on 
> >       >       >       >       >             >       >       improving 
> the slab rebalancer. 
> >       >       >       >       >             >       > 
> >       >       >       >       >             >       >       -Dormando 
> >       >       >       >       >             >       > 
> >       >       >       >       >             >       > 
> >       >       >       >       >             >       > On Saturday, July 
> 11, 2015 at 12:05:54 PM UTC-7, Dormando wrote: 
> >       >       >       >       >             >       >       Hey, 
> >       >       >       >       >             >       > 
> >       >       >       >       >             >       >       On Fri, 10 
> Jul 2015, Scott Mansfield wrote: 
> >       >       >       >       >             >       > 
> >       >       >       >       >             >       >       > We've seen 
> issues recently where we run a cluster that typically has the majority of 
> items overwritten in the same slab every day and a sudden change in data 
> size evicts a ton of data, affecting downstream systems. To be clear that 
> is our problem, but I think there's a tweak in memcached 
> >       that might 
> >       >       >       be useful and 
> >       >       >       >       >             another 
> >       >       >       >       >             >       possible feature 
> that 
> >       >       >       >       >             >       >       would be 
> even 
> >       >       >       >       >             >       >       > better. 
> >       >       >       >       >             >       >       > The data 
> that is written to this cache is overwritten every day, though the TTL is 7 
> days. One slab takes up the majority of the space in the cache. The 
> application wrote e.g. 10KB (slab 21) every day for each key consistently. 
> One day, a change occurred where it started writing 15KB (slab 
> >       23), 
> >       >       >       causing a migration 
> >       >       >       >       >             of data 
> >       >       >       >       >             >       from one slab to 
> >       >       >       >       >             >       >       another. We 
> had -o 
> >       >       >       >       >             >       >       > 
> slab_reassign,slab_automove=1 set on the server, causing large numbers of 
> evictions on the initial slab. Let's say the cache could hold the data at 
> 15KB per key, but the old data was not technically TTL'd out in it's old 
> slab. This means that memory was not being freed by the lru crawler 
> >       thread (I 
> >       >       >       think) because 
> >       >       >       >       its 
> >       >       >       >       >             expiry 
> >       >       >       >       >             >       had not come 
> >       >       >       >       >             >       >       around.  
> >       >       >       >       >             >       >       > 
> >       >       >       >       >             >       >       > lines 1199 
> and 1200 in items.c: 
> >       >       >       >       >             >       >       > if 
> ((search->exptime != 0 && search->exptime < current_time) || 
> is_flushed(search)) { 
> >       >       >       >       >             >       >       > 
> >       >       >       >       >             >       >       > If there 
> was a check to see if this data was "orphaned," i.e. that the key, if 
> accessed, would map to a different slab than the current one, then these 
> orphans could be reclaimed as free memory. I am working on a patch to do 
> this, though I have reservations about performing a hash on the 
> >       key on the 
> >       >       >       lru crawler 
> >       >       >       >       >             thread (if 
> >       >       >       >       >             >       the hash is not 
> >       >       >       >       >             >       >       already 
> available). 
> >       >       >       >       >             >       >       > I have 
> very little experience in the memcached codebase so I don't know the most 
> efficient way to do this. Any help would be appreciated. 
> >       >       >       >       >             >       > 
> >       >       >       >       >             >       >       There seems 
> to be a misconception about how the slab classes work. A key, 
> >       >       >       >       >             >       >       if already 
> existing in a slab, will always map to the slab class it 
> >       >       >       >       >             >       >       currently 
> fits into. The slab classes always exist, but the amount of 
> >       >       >       >       >             >       >       memory 
> reserved for each of them will shift with the slab_reassign. ie: 10 
> >       >       >       >       >             >       >       pages in 
> slab class 21, then memory pressure on 23 causes it to move over. 
> >       >       >       >       >             >       > 
> >       >       >       >       >             >       >       So if you 
> examine a key that still exists in slab class 21, it has no 
> >       >       >       >       >             >       >       reason to 
> move up or down the slab classes. 
> >       >       >       >       >             >       > 
> >       >       >       >       >             >       >       > 
> Alternatively, and possibly more beneficial is compaction of data in a slab 
> using the same set of criteria as lru crawling. Understandably, compaction 
> is a very difficult problem to solve since moving the data would be a pain 
> in the ass. I saw a couple of discussions about this in the 
> >       mailing list, 
> >       >       >       though I didn't 
> >       >       >       >       >             see any 
> >       >       >       >       >             >       firm thoughts about 
> >       >       >       >       >             >       >       it. I think 
> it 
> >       >       >       >       >             >       >       > can 
> probably be done in O(1) like the lru crawler by limiting the number of 
> items it touches each time. Writing and reading are doable in O(1) so 
> moving should be as well. Has anyone given more thought on compaction? 
> >       >       >       >       >             >       > 
> >       >       >       >       >             >       >       I'd be 
> interested in hacking this up for you folks if you can provide me 
> >       >       >       >       >             >       >       testing and 
> some data to work with. With all of the LRU work I did in 
> >       >       >       >       >             >       >       1.4.24, the 
> next things I wanted to do is a big improvement on the slab 
> >       >       >       >       >             >       >       reassignment 
> code. 
> >       >       >       >       >             >       > 
> >       >       >       >       >             >       >       Currently it 
> picks essentially a random slab page, empties it, and moves 
> >       >       >       >       >             >       >       the slab 
> page into the class under pressure. 
> >       >       >       >       >             >       > 
> >       >       >       >       >             >       >       One thing we 
> can do is first examine for free memory in the existing slab, 
> >       >       >       >       >             >       >       IE: 
> >       >       >       >       >             >       > 
> >       >       >       >       >             >       >       - Take a 
> page from slab 21 
> >       >       >       >       >             >       >       - Scan the 
> page for valid items which need to be moved 
> >       >       >       >       >             >       >       - Pull free 
> memory from slab 21, migrate the item (moderately complicated) 
> >       >       >       >       >             >       >       - When the 
> page is empty, move it (or give up if you run out of free 
> >       >       >       >       >             >       >       chunks). 
> >       >       >       >       >             >       > 
> >       >       >       >       >             >       >       The next 
> step is to pull from the LRU on slab 21: 
> >       >       >       >       >             >       > 
> >       >       >       >       >             >       >       - Take page 
> from slab 21 
> >       >       >       >       >             >       >       - Scan page 
> for valid items 
> >       >       >       >       >             >       >       - Pull free 
> memory from slab 21, migrate the item 
> >       >       >       >       >             >       >         - If no 
> memory free, evict tail of slab 21. use that chunk. 
> >       >       >       >       >             >       >       - When the 
> page is empty, move it. 
> >       >       >       >       >             >       > 
> >       >       >       >       >             >       >       Then, when 
> you hit this condition your least-recently-used data gets 
> >       >       >       >       >             >       >       culled as 
> new data migrates your page class. This should match a natural 
> >       >       >       >       >             >       >       occurrance 
> if you would already be evicting valid (but old) items to make 
> >       >       >       >       >             >       >       room for new 
> items. 
> >       >       >       >       >             >       > 
> >       >       >       >       >             >       >       A bonus to 
> using the free memory trick, is that I can use the amount of 
> >       >       >       >       >             >       >       free space 
> in a slab class as a heuristic to more quickly move slab pages 
> >       >       >       >       >             >       >       around. 
> >       >       >       >       >             >       > 
> >       >       >       >       >             >       >       If it's 
> still necessary from there, we can explore "upgrading" items to a 
> >       >       >       >       >             >       >       new slab 
> class, but that is much much more complicated since the item has 
> >       >       >       >       >             >       >       to shift 
> LRU's. Do you put it at the head, the tail, the middle, etc? It 
> >       >       >       >       >             >       >       might be 
> impossible to make a good generic decision there. 
> >       >       >       >       >             >       > 
> >       >       >       >       >             >       >       What version 
> are you currently on? If 1.4.24, have you seen any 
> >       >       >       >       >             >       >       instability? 
> I'm currently torn between fighting a few bugs and start on 
> >       >       >       >       >             >       >       improving 
> the slab rebalancer. 
> >       >       >       >       >             >       > 
> >       >       >       >       >             >       >       -Dormando 
> >       >       >       >       >             >       > 
> >       >       >       >       >             >       > -- 
> >       >       >       >       >             >       > 
> >       >       >       >       >             >       > --- 
> >       >       >       >       >             >       > You received this 
> message because you are subscribed to the Google Groups "memcached" group. 
> >       >       >       >       >             >       > To unsubscribe 
> from this group and stop receiving emails from it, send an email to 
> memcached+...@googlegroups.com. 
> >       >       >       >       >             >       > For more options, 
> visit https://groups.google.com/d/optout. 
> >       >       >       >       >             >       > 
> >       >       >       >       >             >       > 
> >       >       >       >       >             > 
> >       >       >       >       >             > -- 
> >       >       >       >       >             > 
> >   ...

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Check for orphaned items in lru crawler thread

Reply via email to