Re: Check for orphaned items in lru crawler thread

Scott Mansfield Thu, 01 Oct 2015 12:32:37 -0700

The same cluster has > 400 servers happily running 1.4.24. It's been our 
standard deployment for a while now, and we haven't seen any crashes. The 
servers in the same cluster running 1.4.24 (with the same write load the 
new build was taking) have been up for 29 days. The start options do not 
contain the slab_automove option because it wasn't effective for us before. 
The memory given is possibly slightly different per server, as we calculate 
on startup how much we give. It's in the same ballpark, though (~56 gigs).


On Thursday, October 1, 2015 at 12:11:35 PM UTC-7, Dormando wrote:
>
> Just before I sit in and try to narrow this down: have you run any host on 
> 1.4.24 mainline with those same start options? just in case the crash is 
> older 
>
> On Thu, 1 Oct 2015, Scott Mansfield wrote: 
>
> > Another message for you: 
> > [78098.528606] traps: memcached[2757] general protection ip:412b9d 
> sp:7fc0700dbdd0 error:0 in memcached[400000+1d000] 
> > 
> > 
> > addr2line shows: 
> > 
> > $ addr2line -e memcached 412b9d 
> > 
> > 
> /mnt/builds/slave/workspace/TL-SYS-memcached-slab_rebal_next/build/memcached-1.4.24-slab-rebal-next/assoc.c:119
>  
>
> > 
> > 
> > 
> > On Thursday, October 1, 2015 at 1:41:44 AM UTC-7, Dormando wrote: 
> >       Ok, thanks! 
> > 
> >       I'll noodle this a bit... unfortunately a backtrace might be more 
> helpful. 
> >       will ask you to attempt to get one if I don't figure anything out 
> in time. 
> > 
> >       (allow it to core dump or attach a GDB session and set an ignore 
> handler 
> >       for sigpipe/int/etc and run "continue") 
> > 
> >       what were your full startup args, though? 
> > 
> >       On Thu, 1 Oct 2015, Scott Mansfield wrote: 
> > 
> >       > The commit was the latest in slab_rebal_next at the time: 
> >       > 
> https://github.com/dormando/memcached/commit/bdd688b4f20120ad844c8a4803e08c6e03cb061a
>  
> >       > 
> >       > addr2line gave me this output: 
> >       > 
> >       > $ addr2line -e memcached 0x40e007 
> >       > 
> >       > 
> /mnt/builds/slave/workspace/TL-SYS-memcached-slab_rebal_next/build/memcached-1.4.24-slab-rebal-next/slabs.c:264
>  
>
> >       > 
> >       > 
> >       > As well, this was running with production writes, but not reads. 
> Even if we had reads on with the few servers crashing, we're ok 
> architecturally. That's why I can get it out there without worrying too 
> much. For now, I'm going to turn it off. I had a metrics issue anyway that 
> needs to get fixed. Tomorrow I'm planning to test again with more 
> >       metrics, but I 
> >       > can get any new code in pretty quick. 
> >       > 
> >       > 
> >       > On Thursday, October 1, 2015 at 1:01:36 AM UTC-7, Dormando 
> wrote: 
> >       >       How many servers were you running it on? I hope it wasn't 
> more than a 
> >       >       handful. I'd recommend starting with one :P 
> >       > 
> >       >       can you do an addr2line? what were your startup args, and 
> what was the 
> >       >       commit sha1 for the branch you pulled? 
> >       > 
> >       >       sorry about that :/ 
> >       > 
> >       >       On Thu, 1 Oct 2015, Scott Mansfield wrote: 
> >       > 
> >       >       > A few different servers (5 / 205) experienced a segfault 
> all within an hour or so. Unfortunately at this point I'm a bit out of my 
> depth. I have the dmesg output, which is identical for all 5 boxes: 
> >       >       > 
> >       >       > [46545.316351] memcached[2789]: segfault at 0 ip 
> 000000000040e007 sp 00007f362ceedeb0 error 4 in memcached[400000+1d000] 
> >       >       > 
> >       >       > 
> >       >       > I can possibly supply the binary file if needed, though 
> we didn't do anything besides the standard setup and compile. 
> >       >       > 
> >       >       > 
> >       >       > 
> >       >       > On Tuesday, September 29, 2015 at 10:27:59 PM UTC-7, 
> Dormando wrote: 
> >       >       >       If you look at the new branch there's a commit 
> explaining the new stats. 
> >       >       > 
> >       >       >       You can watch slab_reassing_evictions vs 
> slab_reassign_saves. you can also 
> >       >       >       test automove=1 vs automove=2 (please also turn on 
> the lru_maintainer and 
> >       >       >       lru_crawler). 
> >       >       > 
> >       >       >       The initial branch you were running didn't add any 
> new stats. It just 
> >       >       >       restored an old feature. 
> >       >       > 
> >       >       >       On Tue, 29 Sep 2015, Scott Mansfield wrote: 
> >       >       > 
> >       >       >       > An unrelated prod problem meant I had to stop 
> after about an hour. I'm turning it on again tomorrow morning. 
> >       >       >       > Are there any new metrics I should be looking 
> at? Anything new in the stats output? I'm about to take a look at the diffs 
> as well. 
> >       >       >       > 
> >       >       >       > On Tuesday, September 29, 2015 at 12:37:45 PM 
> UTC-7, Dormando wrote: 
> >       >       >       >       excellent. if automove=2 is too aggressive 
> you'll see that come in in a 
> >       >       >       >       hit ratio reduction. 
> >       >       >       > 
> >       >       >       >       the new branch works with automove=2 as 
> well, but it will attempt to 
> >       >       >       >       rescue valid items in the old slab if 
> possible. I'll still be working on 
> >       >       >       >       it for another few hours today though. 
> I'll mail again when I'm done. 
> >       >       >       > 
> >       >       >       >       On Tue, 29 Sep 2015, Scott Mansfield 
> wrote: 
> >       >       >       > 
> >       >       >       >       > I have the first commit 
> (slab_automove=2) running in prod right now. Later today will be a full 
> load production test of the latest code. I'll just let it run for a few 
> days unless I spot any problems. We have good metrics for latency et. al. 
> from the client side, though network normally dwarfs memcached time. 
> >       >       >       >       > 
> >       >       >       >       > On Tuesday, September 29, 2015 at 
> 3:10:03 AM UTC-7, Dormando wrote: 
> >       >       >       >       >       That's unfortunate. 
> >       >       >       >       > 
> >       >       >       >       >       I've done some more work on the 
> branch: 
> >       >       >       >       >       
> https://github.com/memcached/memcached/pull/112 
> >       >       >       >       > 
> >       >       >       >       >       It's not completely likely you 
> would see enough of an improvement from the 
> >       >       >       >       >       new default mode. However if your 
> item sizes change gradually, items are 
> >       >       >       >       >       reclaimed during expiration, or 
> get overwritten (and thus freed in the old 
> >       >       >       >       >       class), it should work just fine. 
> I have another patch coming which should 
> >       >       >       >       >       help though. 
> >       >       >       >       > 
> >       >       >       >       >       Open to feedback from any 
> interested party. 
> >       >       >       >       > 
> >       >       >       >       >       On Fri, 25 Sep 2015, Scott 
> Mansfield wrote: 
> >       >       >       >       > 
> >       >       >       >       >       > I have it running internally, 
> and it runs fine under normal load. It's difficult to put it into the line 
> of fire for a production workload because of social reasons... As well it's 
> a degenerate case that we normally don't run in to (and actively try to 
> avoid). I'm going to run some heavier load tests on it 
> >       today.  
> >       >       >       >       >       > 
> >       >       >       >       >       > On Wednesday, September 9, 2015 
> at 10:23:32 AM UTC-7, Scott Mansfield wrote: 
> >       >       >       >       >       >       I'm working on getting a 
> test going internally. I'll let you know how it goes.  
> >       >       >       >       >       > 
> >       >       >       >       >       > 
> >       >       >       >       >       > Scott Mansfield 
> >       >       >       >       >       > On Mon, Sep 7, 2015 at 2:33 PM, 
> dormando wrote: 
> >       >       >       >       >       >       Yo, 
> >       >       >       >       >       > 
> >       >       >       >       >       >       
> https://github.com/dormando/memcached/commits/slab_rebal_next - would you 
> >       >       >       >       >       >       mind playing around with 
> the branch here? You can see the start options in 
> >       >       >       >       >       >       the test. 
> >       >       >       >       >       > 
> >       >       >       >       >       >       This is a dead simple 
> modification (a restoration of a feature that was 
> >       >       >       >       >       >       arleady there...). The 
> test very aggressively writes and is able to shunt 
> >       >       >       >       >       >       memory around 
> appropriately. 
> >       >       >       >       >       > 
> >       >       >       >       >       >       The work I'm exploring 
> right now will allow savings of items being 
> >       >       >       >       >       >       rebalanced from, and 
> increasing the aggression of page moving without 
> >       >       >       >       >       >       being so brain damaged 
> about it. 
> >       >       >       >       >       > 
> >       >       >       >       >       >       But while I'm poking 
> around with that, I'd be interested in knowing if 
> >       >       >       >       >       >       this simple branch is an 
> improvement, and if so how much. 
> >       >       >       >       >       > 
> >       >       >       >       >       >       I'll push more code to the 
> branch, but the changes should be gated behind 
> >       >       >       >       >       >       a feature flag. 
> >       >       >       >       >       > 
> >       >       >       >       >       >       On Tue, 18 Aug 2015, 
> 'Scott Mansfield' via memcached wrote: 
> >       >       >       >       >       > 
> >       >       >       >       >       >       > 
> >       >       >       >       >       >       > No worries man, you're 
> doing us a favor. Let me know if there's anything you need from us, and I 
> promise I'll be quicker this time :) 
> >       >       >       >       >       >       > 
> >       >       >       >       >       >       > On Aug 18, 2015 12:01 
> AM, "dormando" <dorm...@rydia.net> wrote: 
> >       >       >       >       >       >       >       Hey, 
> >       >       >       >       >       >       > 
> >       >       >       >       >       >       >       I'm still really 
> interested in working on this. I'll be taking a careful 
> >       >       >       >       >       >       >       look soon I hope. 
> >       >       >       >       >       >       > 
> >       >       >       >       >       >       >       On Mon, 3 Aug 
> 2015, Scott Mansfield wrote: 
> >       >       >       >       >       >       > 
> >       >       >       >       >       >       >       > I've tweaked the 
> program slightly, so I'm adding a new version. It prints more stats as it 
> goes and runs a bit faster. 
> >       >       >       >       >       >       >       > 
> >       >       >       >       >       >       >       > On Monday, 
> August 3, 2015 at 1:20:37 AM UTC-7, Scott Mansfield wrote: 
> >       >       >       >       >       >       >       >       Total 
> brain fart on my part. Apparently I had memcached 1.4.13 on my path (who 
> knows how...) Using the actual one that I've built works. Sorry for the 
> confusion... can't believe I didn't realize that before. I'm testing 
> against the compiled one now to see how it behaves. 
> >       >       >       >       >       >       >       >       On Monday, 
> August 3, 2015 at 1:15:06 AM UTC-7, Dormando wrote: 
> >       >       >       >       >       >       >       >             You 
> sure that's 1.4.24? None of those fail for me :( 
> >       >       >       >       >       >       >       > 
> >       >       >       >       >       >       >       >             On 
> Mon, 3 Aug 2015, Scott Mansfield wrote: 
> >       >       >       >       >       >       >       > 
> >       >       >       >       >       >       >       >             > 
> The command line I've used that will start is: 
> >       >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >       >             > 
> memcached -m 64 -o slab_reassign,slab_automove 
> >       >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >       >             > 
> the ones that fail are: 
> >       >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >       >             > 
> memcached -m 64 -o slab_reassign,slab_automove,lru_crawler,lru_maintainer 
> >       >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >       >             > 
> memcached -o lru_crawler 
> >       >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >       >             > 
> I'm sure I've missed something during compile, though I just used 
> ./configure and make. 
> >       >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >       >             > On 
> Monday, August 3, 2015 at 12:22:33 AM UTC-7, Scott Mansfield wrote: 
> >       >       >       >       >       >       >       >             >   
>     I've attached a pretty simple program to connect, fill a slab with 
> data, and then fill another slab slowly with data of a different size. I've 
> been trying to get memcached to run with the lru_crawler and lru_maintainer 
> flags, but I get ' 
> >       >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >       >             >   
>     Illegal suboption "(null)"' every time I try to start with either in 
> any configuration. 
> >       >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >       >             >   
>     I haven't seen it start to move slabs automatically with a freshly 
> installed 1.2.24. 
> >       >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >       >             >   
>     On Tuesday, July 21, 2015 at 4:55:17 PM UTC-7, Scott Mansfield wrote: 
> >       >       >       >       >       >       >       >             >   
>           I realize I've not given you the tests to reproduce the behavior. 
> I should be able to soon. Sorry about the delay here. 
> >       >       >       >       >       >       >       >             > In 
> the mean time, I wanted to bring up a possible secondary use of the same 
> logic to move items on slab rebalancing. I think the system might benefit 
> from using the same logic to crawl the pages in a slab and compact the data 
> in the background. In the case where we have memory that 
> >       is 
> >       >       assigned to 
> >       >       >       the slab 
> >       >       >       >       but not 
> >       >       >       >       >       >       being used 
> >       >       >       >       >       >       >       because 
> >       >       >       >       >       >       >       >             of 
> replaced 
> >       >       >       >       >       >       >       >             > or 
> TTL'd out data, returning the memory to a pool of free memory will allow a 
> slab to grow with that memory first instead of waiting for an event where 
> memory is needed at that instant. 
> >       >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >       >             > 
> It's a change in approach, from reactive to proactive. What do you think? 
> >       >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >       >             > On 
> Monday, July 13, 2015 at 5:54:11 PM UTC-7, Dormando wrote: 
> >       >       >       >       >       >       >       >             >   
>     > First, more detail for you: 
> >       >       >       >       >       >       >       >             >   
>     > 
> >       >       >       >       >       >       >       >             >   
>     > We are running 1.4.24 in production and haven't noticed any bugs as 
> of yet. The new LRUs seem to be working well, though we nearly always run 
> memcached scaled to hold all data without evictions. Those with evictions 
> are behaving well. Those without evictions haven't seen 
> >       crashing or 
> >       >       any 
> >       >       >       other 
> >       >       >       >       noticeable 
> >       >       >       >       >       bad 
> >       >       >       >       >       >       behavior. 
> >       >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >       >             >   
>     Neat. 
> >       >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >       >             >   
>     > 
> >       >       >       >       >       >       >       >             >   
>     > OK, I think I see an area where I was speculating on functionality. 
> If you have a key in slab 21 and then the same key is written again at a 
> larger size in slab 23 I assumed that the space in 21 was not freed on the 
> second write. With that assumption, the LRU crawler would 
> >       not free 
> >       >       up that 
> >       >       >       space. 
> >       >       >       >       Also just 
> >       >       >       >       >       >       by observation 
> >       >       >       >       >       >       >       in 
> >       >       >       >       >       >       >       >             the 
> >       >       >       >       >       >       >       >             >   
>     macro, the space is not freed 
> >       >       >       >       >       >       >       >             >   
>     > fast enough to be effective, in our use case, to accept the writes 
> that are happening. Think in the hundreds of millions of "overwrites" in a 
> 6 - 10 hour period across a cluster. 
> >       >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >       >             >   
>     Internally, "items" (a key/value pair) are generally immutable. The 
> only 
> >       >       >       >       >       >       >       >             >   
>     time when it's not is for INCR/DECR, and it still becomes immutable if 
> two 
> >       >       >       >       >       >       >       >             >   
>     INCR/DECR's collide. 
> >       >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >       >             >   
>     What this means, is that the new item is staged in a piece of free 
> memory 
> >       >       >       >       >       >       >       >             >   
>     while the "upload" stage of the SET happens. When memcached has all of 
> the 
> >       >       >       >       >       >       >       >             >   
>     data in memory to replace the item, it does an internal swap under a 
> lock. 
> >       >       >       >       >       >       >       >             >   
>     The old item is removed from the hash table and LRU, and the new item 
> gets 
> >       >       >       >       >       >       >       >             >   
>     put in its place (at the head of the LRU). 
> >       >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >       >             >   
>     Since items are refcounted, this means that if other users are 
> downloading 
> >       >       >       >       >       >       >       >             >   
>     an item which just got replaced, their memory doesn't get corrupted by 
> the 
> >       >       >       >       >       >       >       >             >   
>     item changing out from underneath them. They can continue to read the 
> old 
> >       >       >       >       >       >       >       >             >   
>     item until they're done. When the refcount reaches zero the old memory 
> is 
> >       >       >       >       >       >       >       >             >   
>     reclaimed. 
> >       >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >       >             >   
>     Most of the time, the item replacement happens then the old memory is 
> >       >       >       >       >       >       >       >             >   
>     immediately removed. 
> >       >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >       >             >   
>     However, this does mean that you need *one* piece of free memory to 
> >       >       >       >       >       >       >       >             >   
>     replace the old one. Then the old memory gets freed after that set. 
> >       >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >       >             >   
>     So if you take a memcached instance with 0 free chunks, and do a 
> rolling 
> >       >       >       >       >       >       >       >             >   
>     replacement of all items (within the same slab class as before), the 
> first 
> >       >       >       >       >       >       >       >             >   
>     one would cause an eviction from the tail of the LRU to get a free 
> chunk. 
> >       >       >       >       >       >       >       >             >   
>     Every SET after that would use the chunk freed from the replacement of 
> the 
> >       >       >       >       >       >       >       >             >   
>     previous memory. 
> >       >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >       >             >   
>     > After that last sentence I realized I also may not have explained 
> well enough the access pattern. The keys are all overwritten every day, but 
> it takes some time to write them all (obviously). We see a huge increase in 
> the bytes metric as if the new data for the old keys was 
> >       being 
> >       >       written 
> >       >       >       for the 
> >       >       >       >       first 
> >       >       >       >       >       time. 
> >       >       >       >       >       >       Since the 
> >       >       >       >       >       >       >       "old" 
> >       >       >       >       >       >       >       >             slab 
> for 
> >       >       >       >       >       >       >       >             >   
>     the same key doesn't 
> >       >       >       >       >       >       >       >             >   
>     > proactively release memory, it starts to fill up the cache and then 
> start evicting data in the new slab. Once that happens, we see evictions in 
> the old slab because of the algorithm you mentioned (random picking / 
> freeing of memory). Typically we don't see any use for 
> >       "upgrading" an 
> >       >       item as 
> >       >       >       the new 
> >       >       >       >       data 
> >       >       >       >       >       >       would be entirely 
> >       >       >       >       >       >       >       >             new 
> and 
> >       >       >       >       >       >       >       >             >   
>     should wholesale replace the 
> >       >       >       >       >       >       >       >             >   
>     > old data for that key. More specifically, the operation is always 
> set, with different data each day. 
> >       >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >       >             >   
>     Right. Most of your problems will come from two areas. One being that 
> >       >       >       >       >       >       >       >             >   
>     writing data aggressively into the new slab class (unless you set the 
> >       >       >       >       >       >       >       >             >   
>     rebalancer to always-replace mode), the mover will make memory 
> available 
> >       >       >       >       >       >       >       >             >   
>     more slowly than you can insert. So you'll cause extra evictions in the 
> >       >       >       >       >       >       >       >             >   
>     new slab class. 
> >       >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >       >             >   
>     The secondary problem is from the random evictions in the previous slab 
> >       >       >       >       >       >       >       >             >   
>     class as stuff is chucked on the floor to make memory moveable. 
> >       >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >       >             >   
>     > As for testing, we'll be able to put it under real production 
> workload. I don't know what kind of data you mean you need for testing. The 
> data stored in the caches are highly confidential. I can give you all kinds 
> of metrics, since we collect most of the ones that are in the 
> >       stats 
> >       >       and some 
> >       >       >       from the 
> >       >       >       >       stats 
> >       >       >       >       >       >       slabs output. If 
> >       >       >       >       >       >       >       >             you 
> have 
> >       >       >       >       >       >       >       >             >   
>     some specific ones that 
> >       >       >       >       >       >       >       >             >   
>     > need collecting, I'll double check and make sure we can get those. 
> Alternatively, it might be most beneficial to see the metrics in person :) 
> >       >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >       >             >   
>     I just need stats snapshots here and there, and actually putting the 
> thing 
> >       >       >       >       >       >       >       >             >   
>     under load. When I did the LRU work I had to beg for several months 
> >       >       >       >       >       >       >       >             >   
>     before anyone tested it with a production load. This slows things down 
> and 
> >       >       >       >       >       >       >       >             >   
>     demotivates me from working on the project. 
> >       >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >       >             >   
>     Unfortunately my dayjob keeps me pretty busy so ~internet~ would 
> probably 
> >       >       >       >       >       >       >       >             >   
>     be best. 
> >       >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >       >             >   
>     > I can create a driver program to reproduce the behavior on a smaller 
> scale. It would write e.g. 10k keys of 10k size, then rewrite the same keys 
> with different size data. I'll work on that and post it to this thread when 
> I can reproduce the behavior locally. 
> >       >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >       >             >   
>     Ok. There're slab rebalance unit tests in the t/ directory which do 
> things 
> >       >       >       >       >       >       >       >             >   
>     like this, and I've used mc-crusher to slam the rebalancer. It's pretty 
> >       >       >       >       >       >       >       >             >   
>     easy to run one config to load up 10k objects, then flip to the other 
> >       >       >       >       >       >       >       >             >   
>     using the same key namespace. 
> >       >       >       >       >       >       >       >             > 
> >       >       >       >       >       >       >       >             >   
>     > Thanks, 
> >       >       >       >       >       >       >       >             >   
>     > Scott 
> >       >       >       >       >       >       >       >             >   
>     > 
> >       >       >       >       >       >       >       >             >   
>     > On Saturday, July 11, 2015 at 12:05:54 PM UTC-7, Dormando wrote: 
> >       >       >       >       >       >       >       >             >   
>     >       Hey, 
> >       >       >       >       >       >       >       >             >   
>     > 
> >       >       >       >       >       >       >       >             >   
>     >       On Fri, 10 Jul 2015, Scott Mansfield wrote: 
> >       >       >       >       >       >       >       >             >   
>     > 
> >       >       >       >       >       >       >       >             >   
>     >       > We've seen issues recently where we run a cluster that 
> typically has the majority of items overwritten in the same slab every day 
> and a sudden change in data size evicts a ton of data, affecting downstream 
> systems. To be clear that is our problem, but I think there's 
> >       a tweak 
> >       >       in 
> >       >       >       memcached 
> >       >       >       >       that might 
> >       >       >       >       >       >       be useful and 
> >       >       >       >       >       >       >       >            
>  another 
> >       >       >       >       >       >       >       >             >   
>     possible feature that 
> >       >       >       >       >       >       >       >             >   
>     >       would be even 
> >       >       >       >       >       >       >       >             >   
>     >       > better. 
> >       >       >       >       >       >       >       >             >   
>     >       > The data that is written to this cache is overwritten every 
> day, though the TTL is 7 days. One slab takes up the majority of the space 
> in the cache. The application wrote e.g. 10KB (slab 21) every day for each 
> key consistently. One day, a change occurred where it 
> >       started 
> >       >       writing 
> >       >       >       15KB (slab 
> >       >       >       >       23), 
> >       >       >       >       >       >       causing a migration 
> >       >       >       >       >       >       >       >             of 
> data 
> >       >       >       >       >       >       >       >             >   
>     from one slab to 
> >       >       >       >       >       >       >       >             >   
>     >       another. We had -o 
> >       >       >       >       >       >       >       >             >   
>     >       > slab_reassign,slab_automove=1 set on the server, causing 
> large numbers of evictions on the initial slab. Let's say the cache could 
> hold the data at 15KB per key, but the old data was not technically TTL'd 
> out in it's old slab. This means that memory was not being 
> >       freed by 
> >       >       the lru 
> >       >       >       crawler 
> >       >       >       >       thread (I 
> >       >       >       >       >       >       think) because 
> >       >       >       >       >       >       >       its 
> >       >       >       >       >       >       >       >            
>  expiry 
> >       >       >       >       >       >       >       >             >   
>     had not come 
> >       >       >       >       >       >       >       >             >   
>     >       around.  
> >       >       >       >       >       >       >       >             >   
>     >       > 
> >       >       >       >       >       >       >       >             >   
>     >       > lines 1199 and 1200 in items.c: 
> >       >       >       >       >       >       >       >             >   
>     >       > if ((search->exptime != 0 && search->exptime < current_time) 
> || is_flushed(search)) { 
> >       >       >       >       >       >       >       >             >   
>     >       > 
> >       >       >       >       >       >       >       >             >   
>     >       > If there was a check to see if this data was "orphaned," i.e. 
> that the key, if accessed, would map to a different slab than the current 
> one, then these orphans could be reclaimed as free memory. I am working on 
> a patch to do this, though I have reservations about 
> >       performing 
> >       >       a hash 
> >       >       >       on the 
> >       >       >       >       key on the 
> >       >       >       >       >       >       lru crawler 
> >       >       >       >       >       >       >       >            
>  thread (if 
> >       >       >       >       >       >       >       >             >   
>     the hash is not 
> >       >       >       >       >       >       >       >             >   
>     >       already available). 
> >       >       >       >       >       >       >       >             >   
>     >       > I have very little experience in the memcached codebase so I 
> don't know the most efficient way to do this. Any help would be 
> appreciated. 
> >       >       >       >       >       >       >       >             >   
>     > 
> >       >       >       >       >       >       >       >             >   
>     >       There seems to be a misconception about how the slab classes 
> work. A key, 
> >       >       >       >       >       >       >       >             >   
>     >       if already existing in a slab, will always map to the slab 
> class it 
> >       >       >       >       >       >       >       >             >   
>     >       currently fits into. The slab classes always exist, but the 
> amount of 
> >       >       >       >       >       >       >       >             >   
>     >       memory reserved for each of them will shift with the 
> slab_reassign. ie: 10 
> >       >       >       >       >       >       >       >             >   
>     >       pages in slab class 21, then memory pressure on 23 causes it to 
> move over. 
> >       >       >       >       >       >       >       >             >   
>     > 
> >       >       >       >       >       >       >       >             >   
>     >       So if you examine a key that still exists in slab class 21, it 
> has no 
> >       >       >       >       >       >       >       >             >   
>     >       reason to move up or down the slab classes. 
> >       >       >       >       >       >       >       >             >   
>     > 
> >       >       >       >       >       >       >       >             >   
>     >       > Alternatively, and possibly more beneficial is compaction of 
> data in a slab using the same set of criteria as lru crawling. 
> Understandably, compaction is a very difficult problem to solve since 
> moving the data would be a pain in the ass. I saw a couple of discussions 
> >       about 
> >       >       this in 
> >       >       >       the 
> >       >       >       >       mailing list, 
> >       >       >       >       >       >       though I didn't 
> >       >       >       >       >       >       >       >             see 
> any 
> >       >       >       >       >       >       >       >             >   
>     firm thoughts about 
> >       >       >       >       >       >       >       >             >   
>     >       it. I think it 
> >       >       >       >       >       >       >       >             >   
>     >       > can probably be done in O(1) like the lru crawler by limiting 
> the number of items it touches each time. Writing and reading are doable in 
> O(1) so moving should be as well. Has anyone given more thought on 
> compaction? 
> >       >       >       >       >       >       >       >             >   
>     > 
> >       >       >       >       >       >       >       >             >   
>     >       I'd be interested in hacking this up for you folks if you can 
> provide me 
> >       >       >       >       >       >       >       >             >   
>     >       testing and some data to work with. With all of the LRU work I 
> did in 
> >       >       >       >       >       >       >       >             >   
>     >       1.4.24, the next things I wanted to do is a big improvement on 
> the slab 
> >       >       >       >       >       >       >       >             >   
>     >       reassignment code. 
> >       >       >       >       >       >       >       >             >   
>     > 
> >       >       >       >       >       >       >       >             >   
>     >       Currently it picks essentially a random slab page, empties it, 
> and moves 
> >       >       >       >       >       >       >       >             >   
>     >       the slab page into the class under pressure. 
> >       >       >       >       >       >       >       >             >   
>     > 
> >       >       >       >       >       >       >       >             >   
>     >       One thing we can do is first examine for free memory in the 
> existing slab, 
> >       >       >       >       >       >       >       >             >   
>     >       IE: 
> >       >       >       >       >       >       >       >             >   
>     > 
> >       >       >       >       >       >       >       >             >   
>     >       - Take a page from slab 21 
> >       >       >       >       >       >       >       >             >   
>     >       - Scan the page for valid items which need to be moved 
> >       >       >       >       >       >       >       >             >   
>     >       - Pull free memory from slab 21, migrate the item (moderately 
> complicated) 
> >       >       >       >       >       >       >       >             >   
>     >       - When the page is empty, move it (or give up if you run out of 
> free 
> >       >       >       >       >       >       >       >             >   
>     >       chunks). 
> >       >       >       >       >       >       >       >             >   
>     > 
> >       >       >       >       >       >       >       >             >   
>     >       The next step is to pull from the LRU on slab 21: 
> >       >       >       >       >       >       >       >             >   
>     > 
> >       >       >       >       >       >       >       >             >   
>     >       - Take page from slab 21 
> >       >       >       >       >       >       >       >             >   
>     >       - Scan page for valid items 
> >       >       >       >       >       >       >       >             >   
>     >       - Pull free memory from slab 21, migrate the item 
> >       >       >       >       >       >       >       >             >   
>     >         - If no memory free, evict tail of slab 21. use that chunk. 
> >       >       >       >       >       >       >       >             >   
>     >       - When the page is empty, move it. 
> >       >       >       >       >       >       >       >             >   
>     > 
> >       >       >       >       >       >       >       >             >   
>     >       Then, when you hit this condition your least-recently-used data 
> gets 
> >       >       >       >       >       >       >       >             >   
>     >       culled as new data migrates your page class. This should match 
> a natural 
> >       >       >       >       >       >       >       >             >   
>     >       occurrance if you would already be evicting valid (but old) 
> items to make 
> >       >       >       >       >       >       >       >             >   
>     >       room for new items. 
> >       >       >       >       >       >       >       >             >   
>     > 
> >       >       >       >       >       >       >       >             >   
>     >       A bonus to using the free memory trick, is that I can use the 
> amount of 
> >       >       >       >       >       >       >       >             >   
>     >       free space in a slab class as a heuristic to more quickly move 
> slab pages 
> >       >       >       >       >       >       >       >             >   
>     >       around. 
> >       >       >       >       >       >       >       >             >   
>     > 
> >       >       >       >       >       >       >       >             >   
>     >       If it's still necessary from there, we can explore "upgrading" 
> items to a 
> >       >       >       >       >       >       >       >             >   
>     >       new slab class, but that is much much more complicated since 
> the item has 
> >       >       >       >       >       >       >       >             >   
>     >       to shift LRU's. Do you put it at the head, the tail, the 
> middle, etc? It 
> >       >       >       >       >       >       >       >             >   
>     >       might be impossible to make a good generic decision there. 
> >       >       >       >       >       >       >       >             >   
>     > 
> >       >       >       >       >       >       >       >             >   
>     >       What version are you currently on? If 1.4.24, have you seen any 
> >       >       >       >       >       >       >       >             >   
>     >       instability? I'm currently torn between fighting a few bugs and 
> start on 
> >       >       >       >       >       >       >       >             > ...

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Check for orphaned items in lru crawler thread

Reply via email to