Re: Check for orphaned items in lru crawler thread

dormando Thu, 01 Oct 2015 14:33:08 -0700

Any chance you could describe (perhaps privately?) in very broad strokes
what the write load looks like? (they're getting only writes, too?).
otherwise I'll have to devise arbitrary torture tests. I'm sure the bug's
in there but it's not obvious yet


On Thu, 1 Oct 2015, dormando wrote:

> perfect, thanks! I have $dayjob as well but will look into this as soon as
> I can. my torture test machines are in a box but I'll try to borrow one
>
> On Thu, 1 Oct 2015, Scott Mansfield wrote:
>
> > Yes. Exact args:
> > -p 11211 -u <omitted> -l 0.0.0.0 -c 100000 -o slab_reassign -o 
> > lru_maintainer,lru_crawler,hash_algorithm=murmur3 -I 4m -m 56253
> >
> > On Thursday, October 1, 2015 at 12:41:06 PM UTC-7, Dormando wrote:
> >       Were lru_maintainer/lru_crawler/etc enabled though? even if slab 
> > mover is
> >       off, those two were the big changes in .24
> >
> >       On Thu, 1 Oct 2015, Scott Mansfield wrote:
> >
> >       > The same cluster has > 400 servers happily running 1.4.24. It's 
> > been our standard deployment for a while now, and we haven't seen any 
> > crashes. The servers in the same cluster running 1.4.24 (with the same 
> > write load the new build was taking) have been up for 29 days. The start 
> > options do not contain the slab_automove option because it wasn't
> >       effective for
> >       > us before. The memory given is possibly slightly different per 
> > server, as we calculate on startup how much we give. It's in the same 
> > ballpark, though (~56 gigs).
> >       >
> >       > On Thursday, October 1, 2015 at 12:11:35 PM UTC-7, Dormando wrote:
> >       >       Just before I sit in and try to narrow this down: have you 
> > run any host on
> >       >       1.4.24 mainline with those same start options? just in case 
> > the crash is
> >       >       older
> >       >
> >       >       On Thu, 1 Oct 2015, Scott Mansfield wrote:
> >       >
> >       >       > Another message for you:
> >       >       > [78098.528606] traps: memcached[2757] general protection 
> > ip:412b9d sp:7fc0700dbdd0 error:0 in memcached[400000+1d000]
> >       >       >
> >       >       >
> >       >       > addr2line shows:
> >       >       >
> >       >       > $ addr2line -e memcached 412b9d
> >       >       >
> >       >       > 
> > /mnt/builds/slave/workspace/TL-SYS-memcached-slab_rebal_next/build/memcached-1.4.24-slab-rebal-next/assoc.c:119
> >       >       >
> >       >       >
> >       >       >
> >       >       > On Thursday, October 1, 2015 at 1:41:44 AM UTC-7, Dormando 
> > wrote:
> >       >       >       Ok, thanks!
> >       >       >
> >       >       >       I'll noodle this a bit... unfortunately a backtrace 
> > might be more helpful.
> >       >       >       will ask you to attempt to get one if I don't figure 
> > anything out in time.
> >       >       >
> >       >       >       (allow it to core dump or attach a GDB session and 
> > set an ignore handler
> >       >       >       for sigpipe/int/etc and run "continue")
> >       >       >
> >       >       >       what were your full startup args, though?
> >       >       >
> >       >       >       On Thu, 1 Oct 2015, Scott Mansfield wrote:
> >       >       >
> >       >       >       > The commit was the latest in slab_rebal_next at the 
> > time:
> >       >       >       > 
> > https://github.com/dormando/memcached/commit/bdd688b4f20120ad844c8a4803e08c6e03cb061a
> >       >       >       >
> >       >       >       > addr2line gave me this output:
> >       >       >       >
> >       >       >       > $ addr2line -e memcached 0x40e007
> >       >       >       >
> >       >       >       > 
> > /mnt/builds/slave/workspace/TL-SYS-memcached-slab_rebal_next/build/memcached-1.4.24-slab-rebal-next/slabs.c:264
> >       >       >       >
> >       >       >       >
> >       >       >       > As well, this was running with production writes, 
> > but not reads. Even if we had reads on with the few servers crashing, we're 
> > ok architecturally. That's why I can get it out there without worrying too 
> > much. For now, I'm going to turn it off. I had a metrics issue anyway that 
> > needs to get fixed. Tomorrow I'm planning to test
> >       again with
> >       >       more
> >       >       >       metrics, but I
> >       >       >       > can get any new code in pretty quick.
> >       >       >       >
> >       >       >       >
> >       >       >       > On Thursday, October 1, 2015 at 1:01:36 AM UTC-7, 
> > Dormando wrote:
> >       >       >       >       How many servers were you running it on? I 
> > hope it wasn't more than a
> >       >       >       >       handful. I'd recommend starting with one :P
> >       >       >       >
> >       >       >       >       can you do an addr2line? what were your 
> > startup args, and what was the
> >       >       >       >       commit sha1 for the branch you pulled?
> >       >       >       >
> >       >       >       >       sorry about that :/
> >       >       >       >
> >       >       >       >       On Thu, 1 Oct 2015, Scott Mansfield wrote:
> >       >       >       >
> >       >       >       >       > A few different servers (5 / 205) 
> > experienced a segfault all within an hour or so. Unfortunately at this 
> > point I'm a bit out of my depth. I have the dmesg output, which is 
> > identical for all 5 boxes:
> >       >       >       >       >
> >       >       >       >       > [46545.316351] memcached[2789]: segfault at 
> > 0 ip 000000000040e007 sp 00007f362ceedeb0 error 4 in memcached[400000+1d000]
> >       >       >       >       >
> >       >       >       >       >
> >       >       >       >       > I can possibly supply the binary file if 
> > needed, though we didn't do anything besides the standard setup and compile.
> >       >       >       >       >
> >       >       >       >       >
> >       >       >       >       >
> >       >       >       >       > On Tuesday, September 29, 2015 at 10:27:59 
> > PM UTC-7, Dormando wrote:
> >       >       >       >       >       If you look at the new branch there's 
> > a commit explaining the new stats.
> >       >       >       >       >
> >       >       >       >       >       You can watch slab_reassing_evictions 
> > vs slab_reassign_saves. you can also
> >       >       >       >       >       test automove=1 vs automove=2 (please 
> > also turn on the lru_maintainer and
> >       >       >       >       >       lru_crawler).
> >       >       >       >       >
> >       >       >       >       >       The initial branch you were running 
> > didn't add any new stats. It just
> >       >       >       >       >       restored an old feature.
> >       >       >       >       >
> >       >       >       >       >       On Tue, 29 Sep 2015, Scott Mansfield 
> > wrote:
> >       >       >       >       >
> >       >       >       >       >       > An unrelated prod problem meant I 
> > had to stop after about an hour. I'm turning it on again tomorrow morning.
> >       >       >       >       >       > Are there any new metrics I should 
> > be looking at? Anything new in the stats output? I'm about to take a look 
> > at the diffs as well.
> >       >       >       >       >       >
> >       >       >       >       >       > On Tuesday, September 29, 2015 at 
> > 12:37:45 PM UTC-7, Dormando wrote:
> >       >       >       >       >       >       excellent. if automove=2 is 
> > too aggressive you'll see that come in in a
> >       >       >       >       >       >       hit ratio reduction.
> >       >       >       >       >       >
> >       >       >       >       >       >       the new branch works with 
> > automove=2 as well, but it will attempt to
> >       >       >       >       >       >       rescue valid items in the old 
> > slab if possible. I'll still be working on
> >       >       >       >       >       >       it for another few hours 
> > today though. I'll mail again when I'm done.
> >       >       >       >       >       >
> >       >       >       >       >       >       On Tue, 29 Sep 2015, Scott 
> > Mansfield wrote:
> >       >       >       >       >       >
> >       >       >       >       >       >       > I have the first commit 
> > (slab_automove=2) running in prod right now. Later today will be a full 
> > load production test of the latest code. I'll just let it run for a few 
> > days unless I spot any problems. We have good metrics for latency et. al. 
> > from the client side, though network normally dwarfs memcached
> >       time.
> >       >       >       >       >       >       >
> >       >       >       >       >       >       > On Tuesday, September 29, 
> > 2015 at 3:10:03 AM UTC-7, Dormando wrote:
> >       >       >       >       >       >       >       That's unfortunate.
> >       >       >       >       >       >       >
> >       >       >       >       >       >       >       I've done some more 
> > work on the branch:
> >       >       >       >       >       >       >       
> > https://github.com/memcached/memcached/pull/112
> >       >       >       >       >       >       >
> >       >       >       >       >       >       >       It's not completely 
> > likely you would see enough of an improvement from the
> >       >       >       >       >       >       >       new default mode. 
> > However if your item sizes change gradually, items are
> >       >       >       >       >       >       >       reclaimed during 
> > expiration, or get overwritten (and thus freed in the old
> >       >       >       >       >       >       >       class), it should 
> > work just fine. I have another patch coming which should
> >       >       >       >       >       >       >       help though.
> >       >       >       >       >       >       >
> >       >       >       >       >       >       >       Open to feedback from 
> > any interested party.
> >       >       >       >       >       >       >
> >       >       >       >       >       >       >       On Fri, 25 Sep 2015, 
> > Scott Mansfield wrote:
> >       >       >       >       >       >       >
> >       >       >       >       >       >       >       > I have it running 
> > internally, and it runs fine under normal load. It's difficult to put it 
> > into the line of fire for a production workload because of social 
> > reasons... As well it's a degenerate case that we normally don't run in to 
> > (and actively try to avoid). I'm going to run some heavier load
> >       tests on
> >       >       it
> >       >       >       today. 
> >       >       >       >       >       >       >       >
> >       >       >       >       >       >       >       > On Wednesday, 
> > September 9, 2015 at 10:23:32 AM UTC-7, Scott Mansfield wrote:
> >       >       >       >       >       >       >       >       I'm working 
> > on getting a test going internally. I'll let you know how it goes. 
> >       >       >       >       >       >       >       >
> >       >       >       >       >       >       >       >
> >       >       >       >       >       >       >       > Scott Mansfield
> >       >       >       >       >       >       >       > On Mon, Sep 7, 2015 
> > at 2:33 PM, dormando wrote:
> >       >       >       >       >       >       >       >       Yo,
> >       >       >       >       >       >       >       >
> >       >       >       >       >       >       >       >       
> > https://github.com/dormando/memcached/commits/slab_rebal_next - would you
> >       >       >       >       >       >       >       >       mind playing 
> > around with the branch here? You can see the start options in
> >       >       >       >       >       >       >       >       the test.
> >       >       >       >       >       >       >       >
> >       >       >       >       >       >       >       >       This is a 
> > dead simple modification (a restoration of a feature that was
> >       >       >       >       >       >       >       >       arleady 
> > there...). The test very aggressively writes and is able to shunt
> >       >       >       >       >       >       >       >       memory around 
> > appropriately.
> >       >       >       >       >       >       >       >
> >       >       >       >       >       >       >       >       The work I'm 
> > exploring right now will allow savings of items being
> >       >       >       >       >       >       >       >       rebalanced 
> > from, and increasing the aggression of page moving without
> >       >       >       >       >       >       >       >       being so 
> > brain damaged about it.
> >       >       >       >       >       >       >       >
> >       >       >       >       >       >       >       >       But while I'm 
> > poking around with that, I'd be interested in knowing if
> >       >       >       >       >       >       >       >       this simple 
> > branch is an improvement, and if so how much.
> >       >       >       >       >       >       >       >
> >       >       >       >       >       >       >       >       I'll push 
> > more code to the branch, but the changes should be gated behind
> >       >       >       >       >       >       >       >       a feature 
> > flag.
> >       >       >       >       >       >       >       >
> >       >       >       >       >       >       >       >       On Tue, 18 
> > Aug 2015, 'Scott Mansfield' via memcached wrote:
> >       >       >       >       >       >       >       >
> >       >       >       >       >       >       >       >       >
> >       >       >       >       >       >       >       >       > No worries 
> > man, you're doing us a favor. Let me know if there's anything you need from 
> > us, and I promise I'll be quicker this time :)
> >       >       >       >       >       >       >       >       >
> >       >       >       >       >       >       >       >       > On Aug 18, 
> > 2015 12:01 AM, "dormando" <dorm...@rydia.net> wrote:
> >       >       >       >       >       >       >       >       >       Hey,
> >       >       >       >       >       >       >       >       >
> >       >       >       >       >       >       >       >       >       I'm 
> > still really interested in working on this. I'll be taking a careful
> >       >       >       >       >       >       >       >       >       look 
> > soon I hope.
> >       >       >       >       >       >       >       >       >
> >       >       >       >       >       >       >       >       >       On 
> > Mon, 3 Aug 2015, Scott Mansfield wrote:
> >       >       >       >       >       >       >       >       >
> >       >       >       >       >       >       >       >       >       > 
> > I've tweaked the program slightly, so I'm adding a new version. It prints 
> > more stats as it goes and runs a bit faster.
> >       >       >       >       >       >       >       >       >       >
> >       >       >       >       >       >       >       >       >       > On 
> > Monday, August 3, 2015 at 1:20:37 AM UTC-7, Scott Mansfield wrote:
> >       >       >       >       >       >       >       >       >       >     
> >   Total brain fart on my part. Apparently I had memcached 1.4.13 on my path 
> > (who knows how...) Using the actual one that I've built works. Sorry for 
> > the confusion... can't believe I didn't realize that before. I'm testing 
> > against the compiled one now to see how it behaves.
> >       >       >       >       >       >       >       >       >       >     
> >   On Monday, August 3, 2015 at 1:15:06 AM UTC-7, Dormando wrote:
> >       >       >       >       >       >       >       >       >       >     
> >         You sure that's 1.4.24? None of those fail for me :(
> >       >       >       >       >       >       >       >       >       >
> >       >       >       >       >       >       >       >       >       >     
> >         On Mon, 3 Aug 2015, Scott Mansfield wrote:
> >       >       >       >       >       >       >       >       >       >
> >       >       >       >       >       >       >       >       >       >     
> >         > The command line I've used that will start is:
> >       >       >       >       >       >       >       >       >       >     
> >         >
> >       >       >       >       >       >       >       >       >       >     
> >         > memcached -m 64 -o slab_reassign,slab_automove
> >       >       >       >       >       >       >       >       >       >     
> >         >
> >       >       >       >       >       >       >       >       >       >     
> >         >
> >       >       >       >       >       >       >       >       >       >     
> >         > the ones that fail are:
> >       >       >       >       >       >       >       >       >       >     
> >         >
> >       >       >       >       >       >       >       >       >       >     
> >         >
> >       >       >       >       >       >       >       >       >       >     
> >         > memcached -m 64 -o 
> > slab_reassign,slab_automove,lru_crawler,lru_maintainer
> >       >       >       >       >       >       >       >       >       >     
> >         >
> >       >       >       >       >       >       >       >       >       >     
> >         > memcached -o lru_crawler
> >       >       >       >       >       >       >       >       >       >     
> >         >
> >       >       >       >       >       >       >       >       >       >     
> >         >
> >       >       >       >       >       >       >       >       >       >     
> >         > I'm sure I've missed something during compile, though I just used 
> > ./configure and make.
> >       >       >       >       >       >       >       >       >       >     
> >         >
> >       >       >       >       >       >       >       >       >       >     
> >         >
> >       >       >       >       >       >       >       >       >       >     
> >         > On Monday, August 3, 2015 at 12:22:33 AM UTC-7, Scott Mansfield 
> > wrote:
> >       >       >       >       >       >       >       >       >       >     
> >         >       I've attached a pretty simple program to connect, fill a 
> > slab with data, and then fill another slab slowly with data of a different 
> > size. I've been trying to get memcached to run with the lru_crawler and 
> > lru_maintainer flags, but I get '
> >       >       >       >       >       >       >       >       >       >     
> >         >
> >       >       >       >       >       >       >       >       >       >     
> >         >       Illegal suboption "(null)"' every time I try to start with 
> > either in any configuration.
> >       >       >       >       >       >       >       >       >       >     
> >         >
> >       >       >       >       >       >       >       >       >       >     
> >         >
> >       >       >       >       >       >       >       >       >       >     
> >         >       I haven't seen it start to move slabs automatically with a 
> > freshly installed 1.2.24.
> >       >       >       >       >       >       >       >       >       >     
> >         >
> >       >       >       >       >       >       >       >       >       >     
> >         >
> >       >       >       >       >       >       >       >       >       >     
> >         >       On Tuesday, July 21, 2015 at 4:55:17 PM UTC-7, Scott 
> > Mansfield wrote:
> >       >       >       >       >       >       >       >       >       >     
> >         >             I realize I've not given you the tests to reproduce 
> > the behavior. I should be able to soon. Sorry about the delay here.
> >       >       >       >       >       >       >       >       >       >     
> >         > In the mean time, I wanted to bring up a possible secondary use 
> > of the same logic to move items on slab rebalancing. I think the system 
> > might benefit from using the same logic to crawl the pages in a slab and 
> > compact the data in the background. In the case where we
> >       have
> >       >       memory that
> >       >       >       is
> >       >       >       >       assigned to
> >       >       >       >       >       the slab
> >       >       >       >       >       >       but not
> >       >       >       >       >       >       >       >       being used
> >       >       >       >       >       >       >       >       >       
> > because
> >       >       >       >       >       >       >       >       >       >     
> >         of replaced
> >       >       >       >       >       >       >       >       >       >     
> >         > or TTL'd out data, returning the memory to a pool of free memory 
> > will allow a slab to grow with that memory first instead of waiting for an 
> > event where memory is needed at that instant.
> >       >       >       >       >       >       >       >       >       >     
> >         >
> >       >       >       >       >       >       >       >       >       >     
> >         > It's a change in approach, from reactive to proactive. What do 
> > you think?
> >       >       >       >       >       >       >       >       >       >     
> >         >
> >       >       >       >       >       >       >       >       >       >     
> >         > On Monday, July 13, 2015 at 5:54:11 PM UTC-7, Dormando wrote:
> >       >       >       >       >       >       >       >       >       >     
> >         >       > First, more detail for you:
> >       >       >       >       >       >       >       >       >       >     
> >         >       >
> >       >       >       >       >       >       >       >       >       >     
> >         >       > We are running 1.4.24 in production and haven't noticed 
> > any bugs as of yet. The new LRUs seem to be working well, though we nearly 
> > always run memcached scaled to hold all data without evictions. Those with 
> > evictions are behaving well. Those without evictions
> >       haven't
> >       >       seen
> >       >       >       crashing or
> >       >       >       >       any
> >       >       >       >       >       other
> >       >       >       >       >       >       noticeable
> >       >       >       >       >       >       >       bad
> >       >       >       >       >       >       >       >       behavior.
> >       >       >       >       >       >       >       >       >       >     
> >         >
> >       >       >       >       >       >       >       >       >       >     
> >         >       Neat.
> >       >       >       >       >       >       >       >       >       >     
> >         >
> >       >       >       >       >       >       >       >       >       >     
> >         >       >
> >       >       >       >       >       >       >       >       >       >     
> >         >       > OK, I think I see an area where I was speculating on 
> > functionality. If you have a key in slab 21 and then the same key is 
> > written again at a larger size in slab 23 I assumed that the space in 21 
> > was not freed on the second write. With that assumption, the LRU
> >       crawler
> >       >       would
> >       >       >       not free
> >       >       >       >       up that
> >       >       >       >       >       space.
> >       >       >       >       >       >       Also just
> >       >       >       >       >       >       >       >       by observation
> >       >       >       >       >       >       >       >       >       in
> >       >       >       >       >       >       >       >       >       >     
> >         the
> >       >       >       >       >       >       >       >       >       >     
> >         >       macro, the space is not freed
> >       >       >       >       >       >       >       >       >       >     
> >         >       > fast enough to be effective, in our use case, to accept 
> > the writes that are happening. Think in the hundreds of millions of 
> > "overwrites" in a 6 - 10 hour period across a cluster.
> >       >       >       >       >       >       >       >       >       >     
> >         >
> >       >       >       >       >       >       >       >       >       >     
> >         >       Internally, "items" (a key/value pair) are generally 
> > immutable. The only
> >       >       >       >       >       >       >       >       >       >     
> >         >       time when it's not is for INCR/DECR, and it still becomes 
> > immutable if two
> >       >       >       >       >       >       >       >       >       >     
> >         >       INCR/DECR's collide.
> >       >       >       >       >       >       >       >       >       >     
> >         >
> >       >       >       >       >       >       >       >       >       >     
> >         >       What this means, is that the new item is staged in a piece 
> > of free memory
> >       >       >       >       >       >       >       >       >       >     
> >         >       while the "upload" stage of the SET happens. When memcached 
> > has all of the
> >       >       >       >       >       >       >       >       >       >     
> >         >       data in memory to replace the item, it does an internal 
> > swap under a lock.
> >       >       >       >       >       >       >       >       >       >     
> >         >       The old item is removed from the hash table and LRU, and 
> > the new item gets
> >       >       >       >       >       >       >       >       >       >     
> >         >       put in its place (at the head of the LRU).
> >       >       >       >       >       >       >       >       >       >     
> >         >
> >       >       >       >       >       >       >       >       >       >     
> >         >       Since items are refcounted, this means that if other users 
> > are downloading
> >       >       >       >       >       >       >       >       >       >     
> >         >       an item which just got replaced, their memory doesn't get 
> > corrupted by the
> >       >       >       >       >       >       >       >       >       >     
> >         >       item changing out from underneath them. They can continue 
> > to read the old
> >       >       >       >       >       >       >       >       >       >     
> >         >       item until they're done. When the refcount reaches zero the 
> > old memory is
> >       >       >       >       >       >       >       >       >       >     
> >         >       reclaimed.
> >       >       >       >       >       >       >       >       >       >     
> >         >
> >       >       >       >       >       >       >       >       >       >     
> >         >       Most of the time, the item replacement happens then the old 
> > memory is
> >       >       >       >       >       >       >       >       >       >     
> >         >       immediately removed.
> >       >       >       >       >       >       >       >       >       >     
> >         >
> >       >       >       >       >       >       >       >       >       >     
> >         >       However, this does mean that you need *one* piece of free 
> > memory to
> >       >       >       >       >       >       >       >       >       >     
> >         >       replace the old one. Then the old memory gets freed after 
> > that set.
> >       >       >       >       >       >       >       >       >       >     
> >         >
> >       >       >       >       >       >       >       >       >       >     
> >         >       So if you take a memcached instance with 0 free chunks, and 
> > do a rolling
> >       >       >       >       >       >       >       >       >       >     
> >         >       replacement of all items (within the same slab class as 
> > before), the first
> >       >       >       >       >       >       >       >       >       >     
> >         >       one would cause an eviction from the tail of the LRU to get 
> > a free chunk.
> >       >       >       >       >       >       >       >       >       >     
> >         >       Every SET after that would use the chunk freed from the 
> > replacement of the
> >       >       >       >       >       >       >       >       >       >     
> >         >       previous memory.
> >       >       >       >       >       >       >       >       >       >     
> >         >
> >       >       >       >       >       >       >       >       >       >     
> >         >       > After that last sentence I realized I also may not have 
> > explained well enough the access pattern. The keys are all overwritten 
> > every day, but it takes some time to write them all (obviously). We see a 
> > huge increase in the bytes metric as if the new data for
> >       the old
> >       >       keys was
> >       >       >       being
> >       >       >       >       written
> >       >       >       >       >       for the
> >       >       >       >       >       >       first
> >       >       >       >       >       >       >       time.
> >       >       >       >       >       >       >       >       Since the
> >       >       >       >       >       >       >       >       >       "old"
> >       >       >       >       >       >       >       >       >       >     
> >         slab for
> >       >       >       >       >       >       >       >       >       >     
> >         >       the same key doesn't
> >       >       >       >       >       >       >       >       >       >     
> >         >       > proactively release memory, it starts to fill up the 
> > cache and then start evicting data in the new slab. Once that happens, we 
> > see evictions in the old slab because of the algorithm you mentioned 
> > (random picking / freeing of memory). Typically we don't see
> >       any use
> >       >       for
> >       >       >       "upgrading" an
> >       >       >       >       item as
> >       >       >       >       >       the new
> >       >       >       >       >       >       data
> >       >       >       >       >       >       >       >       would be 
> > entirely
> >       >       >       >       >       >       >       >       >       >     
> >         new and
> >       >       >       >       >       >       >       >       >       >     
> >         >       should wholesale replace the
> >       >       >       >       >       >       >       >       >       >     
> >         >       > old data for that key. More specifically, the operation 
> > is always set, with different data each day.
> >       >       >       >       >       >       >       >       >       >     
> >         >
> >       >       >       >       >       >       >       >       >       >     
> >         >       Right. Most of your problems will come from two areas. One 
> > being that
> >       >       >       >       >       >       >       >       >       >     
> >         >       writing data aggressively into the new slab class (unless 
> > you set the
> >       >       >       >       >       >       >       >       >       >     
> >         >       rebalancer to always-replace mode), the mover will make 
> > memory available
> >       >       >       >       >       >       >       >       >       >     
> >         >       more slowly than you can insert. So you'll cause extra 
> > evictions in the
> >       >       >       >       >       >       >       >       >       >     
> >         >       new slab class.
> >       >       >       >       >       >       >       >       >       >     
> >         >
> >       >       >       >       >       >       >       >       >       >     
> >         >       The secondary problem is from the random evictions in the 
> > previous slab
> >       >       >       >       >       >       >       >       >       >     
> >         >       class as stuff is chucked on the floor to make memory 
> > moveable.
> >       >       >       >       >       >       >       >       >       >     
> >         >
> >       >       >       >       >       >       >       >       >       >     
> >         >       > As for testing, we'll be able to put it under real 
> > production workload. I don't know what kind of data you mean you need for 
> > testing. The data stored in the caches are highly confidential. I can give 
> > you all kinds of metrics, since we collect most of the ones
> >       that
> >       >       are in the
> >       >       >       stats
> >       >       >       >       and some
> >       >       >       >       >       from the
> >       >       >       >       >       >       stats
> >       >       >       >       >       >       >       >       slabs output. 
> > If
> >       >       >       >       >       >       >       >       >       >     
> >         you have
> >       >       >       >       >       >       >       >       >       >     
> >         >       some specific ones that
> >       >       >       >       >       >       >       >       >       >     
> >         >       > need collecting, I'll double check and make sure we can 
> > get those. Alternatively, it might be most beneficial to see the metrics in 
> > person :)
> >       >       >       >       >       >       >       >       >       >     
> >         >
> >       >       >       >       >       >       >       >       >       >     
> >         >       I just need stats snapshots here and there, and actually 
> > putting the thing
> >       >       >       >       >       >       >       >       >       >     
> >         >       under load. When I did the LRU work I had to beg for 
> > several months
> >       >       >       >       >       >       >       >       >       >     
> >         >       before anyone tested it with a production load. This slows 
> > things down and
> >       >       >       >       >       >       >       >       >       >     
> >         >       demotivates me from working on the project.
> >       >       >       >       >       >       >       >       >       >     
> >         >
> >       >       >       >       >       >       >       >       >       >     
> >         >       Unfortunately my dayjob keeps me pretty busy so ~internet~ 
> > would probably
> >       >       >       >       >       >       >       >       >       >     
> >         >       be best.
> >       >       >       >       >       >       >       >       >       >     
> >         >
> >       >       >       >       >       >       >       >       >       >     
> >         >       > I can create a driver program to reproduce the behavior 
> > on a smaller scale. It would write e.g. 10k keys of 10k size, then rewrite 
> > the same keys with different size data. I'll work on that and post it to 
> > this thread when I can reproduce the behavior locally.
> >       >       >       >       >       >       >       >       >       >     
> >         >
> >       >       >       >       >       >       >       >       >       >     
> >         >       Ok. There're slab rebalance unit tests in the t/ directory 
> > which do things
> >       >       >       >       >       >       >       >       >       >     
> >         >       like this, and I've used mc-crusher to slam the rebalancer. 
> > It's pretty
> >       >       >       >       >       >       >       >       >       >     
> >         >       easy to run one config to load up 10k objects, then flip to 
> > the other
> >       >       >       >       >       >       >       >       >       >     
> >         >       using the same key namespace.
> >       >       >       >       >       >       >       >       >       >     
> >         >
> >       >       >       >       >       >       >       >       >       >     
> >         >       > Thanks,
> >       >       >       >       >       >       >       >       >       >     
> >         >       > Scott
> >       >       >       >       >       >       >       >       >       >     
> >         >       >
> >       >       >       >       >       >       >       >       >       >     
> >         >       > On Saturday, July 11, 2015 at 12:05:54 PM UTC-7, Dormando 
> > wrote:
> >       >       >       >       >       >       >       >       >       >     
> >         >       >       Hey,
> >       >       >       >       >       >       >       >       >       >     
> >         >       >
> >       >       >       >       >       >       >       >       >       >     
> >         >       >       On Fri, 10 Jul 2015, Scott Mansfield wrote:
> >       >       >       >       >       >       >       >       >       >     
> >         >       >
> >       >       >       >       >       >       >       >       >       >     
> >         >       >       > We've seen issues recently where we run a cluster 
> > that typically has the majority of items overwritten in the same slab every 
> > day and a sudden change in data size evicts a ton of data, affecting 
> > downstream systems. To be clear that is our problem, but
> >       I think
> >       >       there's
> >       >       >       a tweak
> >       >       >       >       in
> >       >       >       >       >       memcached
> >       >       >       >       >       >       that might
> >       >       >       >       >       >       >       >       be useful and
> >       >       >       >       >       >       >       >       >       >     
> >         another
> >       >       >       >       >       >       >       >       >       >     
> >         >       possible feature that
> >       >       >       >       >       >       >       >       >       >     
> >         >       >       would be even
> >       >       >       >       >       >       >       >       >       >     
> >         >       >       > better.
> >       >       >       >       >       >       >       >       >       >     
> >         >       >       > The data that is written to this cache is 
> > overwritten every day, though the TTL is 7 days. One slab takes up the 
> > majority of the space in the cache. The application wrote e.g. 10KB (slab 
> > 21) every day for each key consistently. One day, a change
> >       occurred
> >       >       where it
> >       >       >       started
> >       >       >       >       writing
> >       >       >       >       >       15KB (slab
> >       >       >       >       >       >       23),
> >       >       >       >       >       >       >       >       causing a 
> > migration
> >       >       >       >       >       >       >       >       >       >     
> >         of data
> >       >       >       >       >       >       >       >       >       >     
> >         >       from one slab to
> >       >       >       >       >       >       >       >       >       >     
> >         >       >       another. We had -o
> >       >       >       >       >       >       >       >       >       >     
> >         >       >       > slab_reassign,slab_automove=1 set on the server, 
> > causing large numbers of evictions on the initial slab. Let's say the cache 
> > could hold the data at 15KB per key, but the old data was not technically 
> > TTL'd out in it's old slab. This means that memory
> >       was not
> >       >       being
> >       >       >       freed by
> >       >       >       >       the lru
> >       >       >       >       >       crawler
> >       >       >       >       >       >       thread (I
> >       >       >       >       >       >       >       >       think) because
> >       >       >       >       >       >       >       >       >       its
> >       >       >       >       >       >       >       >       >       >     
> >         expiry
> >       >       >       >       >       >       >       >       >       >     
> >         >       had not come
> >       >       >       >       >       >       >       >       >       >     
> >         >       >       around. 
> >       >       >       >       >       >       >       >       >       >     
> >         >       >       >
> >       >       >       >       >       >       >       >       >       >     
> >         >       >       > lines 1199 and 1200 in items.c:
> >       >       >       >       >       >       >       >       >       >     
> >         >       >       > if ((search->exptime != 0 && search->exptime < 
> > current_time) || is_flushed(search)) {
> >       >       >       >       >       >       >       >       >       >     
> >         >       >       >
> >       >       >       >       >       >       >       >       >       >     
> >         >       >       > If there was a check to see if this data was 
> > "orphaned," i.e. that the key, if accessed, would map to a different slab 
> > than the current one, then these orphans could be reclaimed as free memory. 
> > I am working on a patch to do this, though I have
> >       reservations
> >       >       about
> >       >       >       performing
> >       >       >       >       a hash
> >       >       >       >       >       on the
> >       >       >       >       >       >       key on the
> >       >       >       >       >       >       >       >       lru crawler
> >       >       >       >       >       >       >       >       >       >     
> >         thread (if
> >       >       >       >       >       >       >       >       >       >     
> >         >       the hash is not
> >       >       >       >       >       >       >       >       >       >     
> >         >       >       already available).
> >       >       >       >       >       >       >       >       >       >     
> >         >       >       > I have very little experience in the memcached 
> > codebase so I don't know the most efficient way to do this. Any help would 
> > be appreciated.
> >       >       >       >       >       >       >       >       >       >     
> >         >       >
> >       >       >       >       >       >       >       >       >       >     
> >         >       >       There seems to be a misconception about how the 
> > slab classes work. A key,
> >       >       >       >       >       >       >       >       >       >     
> >         >       >       if already existing in a slab, will always map to 
> > the slab class it
> >       >       >       >       >       >       >       >       >       >     
> >         >       >       currently fits into. The slab classes always exist, 
> > but the amount of
> >       >       >       >       >       >       >       >       >       >     
> >         >       >       memory reserved for each of them will shift with 
> > the slab_reassign. ie: 10
> >       >       >       >       >       >       >       >       >       >     
> >         >       >       pages in slab class 21, then memory pressure on 23 
> > causes it to move over.
> >       >       >       >       >       >       >       >       >       >     
> >         >       >
> >       >       >       >       >       >       >       >       >       >     
> >         >       >       So if you examine a key that still exists in slab 
> > class 21, it has no
> >       >       >       >       >       >       >       >       >       >     
> >         >       >       reason to move up or down the slab classes.
> >       >       >       >       >       >       >       >       >       >     
> >         >       >
> >       >       >       >       >       >       >       >       >       >     
> >         >       >       > Alternatively, and possibly more beneficial is 
> > compaction of data in a slab using the same set of criteria as lru 
> > crawling. Understandably, compaction is a very difficult problem to solve 
> > since moving the data would be a pain in the ass. I saw a
> >       couple of
> >       >       discussions
> >       >       >       about
> >       >       >       >       this in
> >       >       >       >       >       the
> >       >       >       >       >       >       mailing list,
> >       >       >       >       >       >       >       >       though I 
> > didn't
> >       >       >       >       >       >       >       >       >       >     
> >         see any
> >       >       >       >       >       >       >       >       >       >     
> >         >       firm thoughts about
> >       >       >       >       >       >       >       >       >       >     
> >         >       >       it. I think it
> >       >       >       >       >       >       >       >       >       >     
> >         >       >       > can probably be done in O(1) like the lru crawler 
> > by limiting the number of items it touches each time. Writing and reading 
> > are doable in O(1) so moving should be as well. Has anyone given more 
> > thought on compaction?
> >       >       >       >       >       >       >       >       >       >     
> >         >       >
> >       >       >       >       >       >       >       >       >       >     
> >         >       >       I'd be interested in hacking this up for you folks 
> > if you can provide me
> >       >       >       >       >       >       >       >       >       >     
> >         >       >       testing and some data to work with. With all of the 
> > LRU work I did in
> >       >       >       >       >       >       >       >       >       >     
> >         >       >       1.4.24, the next things I wanted to do is a big 
> > improvement on the slab
> >       >       >       >       >       >       >       >       >       >     
> >         >       >       reassignment code.
> >       >       >       >       >       >       >       >       >       >     
> >         >       >
> >       >       >       >       >       >       >       >       >       >     
> >         >       >       Currently it picks essentially a random slab page, 
> > empties it, and moves
> >       >       >       >       >       >       >       >       >       >     
> >         >       >       the slab page into the class under pressure.
> >       >       >       >       >       >       >       >       >       >     
> >         >       >
> >       >       >       >       >       >       >       >       >       >     
> >         >       >       One thing we can do is first examine for free 
> > memory in the existing slab,
> >       >       >       >       >       >       >       >       >       >     
> >         >       >       IE:
> >       >       >       >       >       >       >     ...
> >
> > --
> >
> > ---
> > You received this message because you are subscribed to the Google Groups 
> > "memcached" group.
> > To unsubscribe from this group and stop receiving emails from it, send an 
> > email to memcached+unsubscr...@googlegroups.com.
> > For more options, visit https://groups.google.com/d/optout.
> >
> >

Re: Check for orphaned items in lru crawler thread

Reply via email to