Re: Check for orphaned items in lru crawler thread

Scott Mansfield Thu, 01 Oct 2015 21:39:40 -0700

Sorry for the data dumps here, but I want to give you everything I have. I 
found 3 more addresses that showed up in the dmesg logs:


$ for addr in 40e013 40eff4 40f7c4; do addr2line -e memcached $addr; done

.../build/memcached-1.4.24-slab-rebal-next/slabs.c:265 (discriminator 1)

.../build/memcached-1.4.24-slab-rebal-next/items.c:312 (discriminator 1)

.../build/memcached-1.4.24-slab-rebal-next/items.c:1183


I still haven't tried to attach a debugger, since the frequency of the 
error would make it hard to catch it. Is there a handler that I could add 
in to dump the stack trace when it segfaults? I'd get a core dump, but they 
would be HUGE and contain confidential information.


Below are the full dmesg logs. Out of 205 servers, 35 had dmesg logs after 
a memcached crash, and only one crashed twice, both times on the original 
segfault. Below is the full unified set of dmesg logs, from which you can 
get a sense of frequency.


[47992.109269] memcached[2798]: segfault at 0 ip 000000000040e007 sp 
00007f4d20d25eb0 error 4 in memcached[400000+1d000]

[48960.851278] memcached[2805]: segfault at 0 ip 000000000040e007 sp 
00007f3c30d15eb0 error 4 in memcached[400000+1d000]

[46421.604609] memcached[2784]: segfault at 0 ip 000000000040e007 sp 
00007fdb94612eb0 error 4 in memcached[400000+1d000]

[48429.671534] traps: memcached[2768] general protection ip:40e013 
sp:7f1c32676be0 error:0 in memcached[400000+1d000]

[71838.979269] memcached[2792]: segfault at 0 ip 000000000040e007 sp 
00007f0162feeeb0 error 4 in memcached[400000+1d000]

[66763.091475] memcached[2804]: segfault at 0 ip 000000000040e007 sp 
00007f8240170eb0 error 4 in memcached[400000+1d000]

[102544.376092] traps: memcached[2792] general protection ip:40eff4 
sp:7fa58095be18 error:0 in memcached[400000+1d000]

[49932.757825] memcached[2777]: segfault at 0 ip 000000000040e007 sp 
00007f1ff2131eb0 error 4 in memcached[400000+1d000]

[50400.415878] memcached[2794]: segfault at 0 ip 000000000040e007 sp 
00007f11a26daeb0 error 4 in memcached[400000+1d000]

[48986.340345] memcached[2786]: segfault at 0 ip 000000000040e007 sp 
00007f9235279eb0 error 4 in memcached[400000+1d000]

[44742.175894] memcached[2796]: segfault at 0 ip 000000000040e007 sp 
00007eff3a0cceb0 error 4 in memcached[400000+1d000]

[49030.431879] memcached[2776]: segfault at 0 ip 000000000040e007 sp 
00007fdef27cfbe0 error 4 in memcached[400000+1d000]

[50211.611439] traps: memcached[2782] general protection ip:40e013 
sp:7f9ee1723be0 error:0 in memcached[400000+1d000]

[62534.892817] memcached[2783]: segfault at 0 ip 000000000040e007 sp 
00007f37f2d4beb0 error 4 in memcached[400000+1d000]

[78697.201195] memcached[2801]: segfault at 0 ip 000000000040e007 sp 
00007f696ef1feb0 error 4 in memcached[400000+1d000]

[48922.246712] memcached[2804]: segfault at 0 ip 000000000040e007 sp 
00007f1ebb338eb0 error 4 in memcached[400000+1d000]

[52170.371014] memcached[2809]: segfault at 0 ip 000000000040e007 sp 
00007f5e62fcbeb0 error 4 in memcached[400000+1d000]

[69531.775868] memcached[2785]: segfault at 0 ip 000000000040e007 sp 
00007ff50ac2eeb0 error 4 in memcached[400000+1d000]

[48926.661559] memcached[2799]: segfault at 0 ip 000000000040e007 sp 
00007f71e0ac6be0 error 4 in memcached[400000+1d000]

[49491.126885] memcached[2745]: segfault at 0 ip 000000000040e007 sp 
00007f5737c4beb0 error 4 in memcached[400000+1d000]

[104247.724294] traps: memcached[2793] general protection ip:40f7c4 
sp:7f3af8c27eb0 error:0 in memcached[400000+1d000]

[78098.528606] traps: memcached[2757] general protection ip:412b9d 
sp:7fc0700dbdd0 error:0 in memcached[400000+1d000]

[71958.385432] memcached[2809]: segfault at 0 ip 000000000040e007 sp 
00007f8b68cd0eb0 error 4 in memcached[400000+1d000]

[48934.182852] memcached[2787]: segfault at 0 ip 000000000040e007 sp 
00007f0aef774eb0 error 4 in memcached[400000+1d000]

[104220.754195] traps: memcached[2802] general protection ip:40f7c4 
sp:7ffa85a2deb0 error:0 in memcached[400000+1d000]

[45807.670246] memcached[2755]: segfault at 0 ip 000000000040e007 sp 
00007fd74a1d0eb0 error 4 in memcached[400000+1d000]

[73640.102621] memcached[2802]: segfault at 0 ip 000000000040e007 sp 
00007f7bb30bfeb0 error 4 in memcached[400000+1d000]

[67690.640196] memcached[2787]: segfault at 0 ip 000000000040e007 sp 
00007f299580feb0 error 4 in memcached[400000+1d000]

[57729.895442] memcached[2786]: segfault at 0 ip 000000000040e007 sp 
00007f204073deb0 error 4 in memcached[400000+1d000]

[48009.284226] memcached[2801]: segfault at 0 ip 000000000040e007 sp 
00007f7b30876eb0 error 4 in memcached[400000+1d000]

[48198.211826] memcached[2811]: segfault at 0 ip 000000000040e007 sp 
00007fd496d79eb0 error 4 in memcached[400000+1d000]

[84057.439927] traps: memcached[2804] general protection ip:40f7c4 
sp:7fbe75fffeb0 error:0 in memcached[400000+1d000]

[50215.489124] memcached[2784]: segfault at 0 ip 000000000040e007 sp 
00007f3234b73eb0 error 4 in memcached[400000+1d000]

[46545.316351] memcached[2789]: segfault at 0 ip 000000000040e007 sp 
00007f362ceedeb0 error 4 in memcached[400000+1d000]

[102076.523474] memcached[29833]: segfault at 0 ip 000000000040e007 sp 
00007f3c89b9ebe0 error 4 in memcached[400000+1d000]

[55537.568254] memcached[2780]: segfault at 0 ip 000000000040e007 sp 
00007fc1f6005eb0 error 4 in memcached[400000+1d000]




On Thursday, October 1, 2015 at 5:40:35 PM UTC-7, Dormando wrote:
>
> got it. that might be a decent hint actually... I had addded a bugfix to 
> the branch to not miscount the mem_requested counter, but it's not working 
> or I missed a spot. 
>
> On Thu, 1 Oct 2015, Scott Mansfield wrote: 
>
> > The number now, after maybe 90 minutes of writes, is 1,446. I think 
> after disabling a lot of the data TTL'd out. I have to disable it for now, 
> again (for unrelated reasons, again). The page that I screenshotted gives 
> real time data, so the numbers were from right then. Last night, it should 
> have shown better numbers in terms of "total_pages", but I didn't 
> > get a screenshot. That number is directly from the stats slabs output. 
> > 
> > 
> > 
> > On Thursday, October 1, 2015 at 4:21:42 PM UTC-7, Dormando wrote: 
> >       ok... slab class 12 claims to have 2 in "total_pages", yet 14g in 
> >       mem_requested. is this stat wrong? 
> > 
> >       On Thu, 1 Oct 2015, Scott Mansfield wrote: 
> > 
> >       > The ones that crashed (new code cluster) were set to only be 
> written to from the client applications. The data is an index key and a 
> series of data keys that are all written one after another. Each key might 
> be hashed to a different server, though, so not all of them are written to 
> the same server. I can give you a snapshot of one of the 
> >       clusters that 
> >       > didn't crash (attached file). I can give more detail offline if 
> you need it. 
> >       > 
> >       > 
> >       > On Thursday, October 1, 2015 at 2:32:53 PM UTC-7, Dormando 
> wrote: 
> >       >       Any chance you could describe (perhaps privately?) in very 
> broad strokes 
> >       >       what the write load looks like? (they're getting only 
> writes, too?). 
> >       >       otherwise I'll have to devise arbitrary torture tests. I'm 
> sure the bug's 
> >       >       in there but it's not obvious yet 
> >       > 
> >       >       On Thu, 1 Oct 2015, dormando wrote: 
> >       > 
> >       >       > perfect, thanks! I have $dayjob as well but will look 
> into this as soon as 
> >       >       > I can. my torture test machines are in a box but I'll 
> try to borrow one 
> >       >       > 
> >       >       > On Thu, 1 Oct 2015, Scott Mansfield wrote: 
> >       >       > 
> >       >       > > Yes. Exact args: 
> >       >       > > -p 11211 -u <omitted> -l 0.0.0.0 -c 100000 -o 
> slab_reassign -o lru_maintainer,lru_crawler,hash_algorithm=murmur3 -I 4m -m 
> 56253 
> >       >       > > 
> >       >       > > On Thursday, October 1, 2015 at 12:41:06 PM UTC-7, 
> Dormando wrote: 
> >       >       > >       Were lru_maintainer/lru_crawler/etc enabled 
> though? even if slab mover is 
> >       >       > >       off, those two were the big changes in .24 
> >       >       > > 
> >       >       > >       On Thu, 1 Oct 2015, Scott Mansfield wrote: 
> >       >       > > 
> >       >       > >       > The same cluster has > 400 servers happily 
> running 1.4.24. It's been our standard deployment for a while now, and we 
> haven't seen any crashes. The servers in the same cluster running 1.4.24 
> (with the same write load the new build was taking) have been up for 29 
> days. The start options do not contain the slab_automove option 
> >       because 
> >       >       it wasn't 
> >       >       > >       effective for 
> >       >       > >       > us before. The memory given is possibly 
> slightly different per server, as we calculate on startup how much we give. 
> It's in the same ballpark, though (~56 gigs). 
> >       >       > >       > 
> >       >       > >       > On Thursday, October 1, 2015 at 12:11:35 PM 
> UTC-7, Dormando wrote: 
> >       >       > >       >       Just before I sit in and try to narrow 
> this down: have you run any host on 
> >       >       > >       >       1.4.24 mainline with those same start 
> options? just in case the crash is 
> >       >       > >       >       older 
> >       >       > >       > 
> >       >       > >       >       On Thu, 1 Oct 2015, Scott Mansfield 
> wrote: 
> >       >       > >       > 
> >       >       > >       >       > Another message for you: 
> >       >       > >       >       > [78098.528606] traps: memcached[2757] 
> general protection ip:412b9d sp:7fc0700dbdd0 error:0 
> in memcached[400000+1d000] 
> >       >       > >       >       > 
> >       >       > >       >       > 
> >       >       > >       >       > addr2line shows: 
> >       >       > >       >       > 
> >       >       > >       >       > $ addr2line -e memcached 412b9d 
> >       >       > >       >       > 
> >       >       > >       >       > 
> /mnt/builds/slave/workspace/TL-SYS-memcached-slab_rebal_next/build/memcached-1.4.24-slab-rebal-next/assoc.c:119
>  
>
> >       >       > >       >       > 
> >       >       > >       >       > 
> >       >       > >       >       > 
> >       >       > >       >       > On Thursday, October 1, 2015 at 
> 1:41:44 AM UTC-7, Dormando wrote: 
> >       >       > >       >       >       Ok, thanks! 
> >       >       > >       >       > 
> >       >       > >       >       >       I'll noodle this a bit... 
> unfortunately a backtrace might be more helpful. 
> >       >       > >       >       >       will ask you to attempt to get 
> one if I don't figure anything out in time. 
> >       >       > >       >       > 
> >       >       > >       >       >       (allow it to core dump or attach 
> a GDB session and set an ignore handler 
> >       >       > >       >       >       for sigpipe/int/etc and run 
> "continue") 
> >       >       > >       >       > 
> >       >       > >       >       >       what were your full startup 
> args, though? 
> >       >       > >       >       > 
> >       >       > >       >       >       On Thu, 1 Oct 2015, Scott 
> Mansfield wrote: 
> >       >       > >       >       > 
> >       >       > >       >       >       > The commit was the latest in 
> slab_rebal_next at the time: 
> >       >       > >       >       >       > 
> https://github.com/dormando/memcached/commit/bdd688b4f20120ad844c8a4803e08c6e03cb061a
>  
> >       >       > >       >       >       > 
> >       >       > >       >       >       > addr2line gave me this output: 
> >       >       > >       >       >       > 
> >       >       > >       >       >       > $ addr2line -e memcached 
> 0x40e007 
> >       >       > >       >       >       > 
> >       >       > >       >       >       > 
> /mnt/builds/slave/workspace/TL-SYS-memcached-slab_rebal_next/build/memcached-1.4.24-slab-rebal-next/slabs.c:264
>  
>
> >       >       > >       >       >       > 
> >       >       > >       >       >       > 
> >       >       > >       >       >       > As well, this was running with 
> production writes, but not reads. Even if we had reads on with the few 
> servers crashing, we're ok architecturally. That's why I can get it out 
> there without worrying too much. For now, I'm going to turn it off. I had a 
> metrics issue anyway that needs to get fixed. Tomorrow I'm 
> >       planning 
> >       >       to test 
> >       >       > >       again with 
> >       >       > >       >       more 
> >       >       > >       >       >       metrics, but I 
> >       >       > >       >       >       > can get any new code in pretty 
> quick. 
> >       >       > >       >       >       > 
> >       >       > >       >       >       > 
> >       >       > >       >       >       > On Thursday, October 1, 2015 
> at 1:01:36 AM UTC-7, Dormando wrote: 
> >       >       > >       >       >       >       How many servers were 
> you running it on? I hope it wasn't more than a 
> >       >       > >       >       >       >       handful. I'd recommend 
> starting with one :P 
> >       >       > >       >       >       > 
> >       >       > >       >       >       >       can you do an addr2line? 
> what were your startup args, and what was the 
> >       >       > >       >       >       >       commit sha1 for the 
> branch you pulled? 
> >       >       > >       >       >       > 
> >       >       > >       >       >       >       sorry about that :/ 
> >       >       > >       >       >       > 
> >       >       > >       >       >       >       On Thu, 1 Oct 2015, 
> Scott Mansfield wrote: 
> >       >       > >       >       >       > 
> >       >       > >       >       >       >       > A few different 
> servers (5 / 205) experienced a segfault all within an hour or so. 
> Unfortunately at this point I'm a bit out of my depth. I have the dmesg 
> output, which is identical for all 5 boxes: 
> >       >       > >       >       >       >       > 
> >       >       > >       >       >       >       > [46545.316351] 
> memcached[2789]: segfault at 0 ip 000000000040e007 sp 00007f362ceedeb0 
> error 4 in memcached[400000+1d000] 
> >       >       > >       >       >       >       > 
> >       >       > >       >       >       >       > 
> >       >       > >       >       >       >       > I can possibly supply 
> the binary file if needed, though we didn't do anything besides the 
> standard setup and compile. 
> >       >       > >       >       >       >       > 
> >       >       > >       >       >       >       > 
> >       >       > >       >       >       >       > 
> >       >       > >       >       >       >       > On Tuesday, September 
> 29, 2015 at 10:27:59 PM UTC-7, Dormando wrote: 
> >       >       > >       >       >       >       >       If you look at 
> the new branch there's a commit explaining the new stats. 
> >       >       > >       >       >       >       > 
> >       >       > >       >       >       >       >       You can watch 
> slab_reassing_evictions vs slab_reassign_saves. you can also 
> >       >       > >       >       >       >       >       test automove=1 
> vs automove=2 (please also turn on the lru_maintainer and 
> >       >       > >       >       >       >       >       lru_crawler). 
> >       >       > >       >       >       >       > 
> >       >       > >       >       >       >       >       The initial 
> branch you were running didn't add any new stats. It just 
> >       >       > >       >       >       >       >       restored an old 
> feature. 
> >       >       > >       >       >       >       > 
> >       >       > >       >       >       >       >       On Tue, 29 Sep 
> 2015, Scott Mansfield wrote: 
> >       >       > >       >       >       >       > 
> >       >       > >       >       >       >       >       > An unrelated 
> prod problem meant I had to stop after about an hour. I'm turning it on 
> again tomorrow morning. 
> >       >       > >       >       >       >       >       > Are there any 
> new metrics I should be looking at? Anything new in the stats output? I'm 
> about to take a look at the diffs as well. 
> >       >       > >       >       >       >       >       > 
> >       >       > >       >       >       >       >       > On Tuesday, 
> September 29, 2015 at 12:37:45 PM UTC-7, Dormando wrote: 
> >       >       > >       >       >       >       >       >       
> excellent. if automove=2 is too aggressive you'll see that come in in a 
> >       >       > >       >       >       >       >       >       hit 
> ratio reduction. 
> >       >       > >       >       >       >       >       > 
> >       >       > >       >       >       >       >       >       the new 
> branch works with automove=2 as well, but it will attempt to 
> >       >       > >       >       >       >       >       >       rescue 
> valid items in the old slab if possible. I'll still be working on 
> >       >       > >       >       >       >       >       >       it for 
> another few hours today though. I'll mail again when I'm done. 
> >       >       > >       >       >       >       >       > 
> >       >       > >       >       >       >       >       >       On Tue, 
> 29 Sep 2015, Scott Mansfield wrote: 
> >       >       > >       >       >       >       >       > 
> >       >       > >       >       >       >       >       >       > I have 
> the first commit (slab_automove=2) running in prod right now. Later today 
> will be a full load production test of the latest code. I'll just let it 
> run for a few days unless I spot any problems. We have good metrics for 
> latency et. al. from the client side, though network normally 
> >       dwarfs 
> >       >       memcached 
> >       >       > >       time. 
> >       >       > >       >       >       >       >       >       > 
> >       >       > >       >       >       >       >       >       > On 
> Tuesday, September 29, 2015 at 3:10:03 AM UTC-7, Dormando wrote: 
> >       >       > >       >       >       >       >       >       >       
> That's unfortunate. 
> >       >       > >       >       >       >       >       >       > 
> >       >       > >       >       >       >       >       >       >       
> I've done some more work on the branch: 
> >       >       > >       >       >       >       >       >       >       
> https://github.com/memcached/memcached/pull/112 
> >       >       > >       >       >       >       >       >       > 
> >       >       > >       >       >       >       >       >       >       
> It's not completely likely you would see enough of an improvement from the 
> >       >       > >       >       >       >       >       >       >       
> new default mode. However if your item sizes change gradually, items are 
> >       >       > >       >       >       >       >       >       >       
> reclaimed during expiration, or get overwritten (and thus freed in the old 
> >       >       > >       >       >       >       >       >       >       
> class), it should work just fine. I have another patch coming which should 
> >       >       > >       >       >       >       >       >       >       
> help though. 
> >       >       > >       >       >       >       >       >       > 
> >       >       > >       >       >       >       >       >       >       
> Open to feedback from any interested party. 
> >       >       > >       >       >       >       >       >       > 
> >       >       > >       >       >       >       >       >       >       
> On Fri, 25 Sep 2015, Scott Mansfield wrote: 
> >       >       > >       >       >       >       >       >       > 
> >       >       > >       >       >       >       >       >       >       
> > I have it running internally, and it runs fine under normal load. It's 
> difficult to put it into the line of fire for a production workload because 
> of social reasons... As well it's a degenerate case that we normally don't 
> run in to (and actively try to avoid). I'm going to run 
> >       some 
> >       >       heavier load 
> >       >       > >       tests on 
> >       >       > >       >       it 
> >       >       > >       >       >       today.  
> >       >       > >       >       >       >       >       >       >       
> > 
> >       >       > >       >       >       >       >       >       >       
> > On Wednesday, September 9, 2015 at 10:23:32 AM UTC-7, Scott Mansfield 
> wrote: 
> >       >       > >       >       >       >       >       >       >       
> >       I'm working on getting a test going internally. I'll let you know 
> how it goes.  
> >       >       > >       >       >       >       >       >       >       
> > 
> >       >       > >       >       >       >       >       >       >       
> > 
> >       >       > >       >       >       >       >       >       >       
> > Scott Mansfield 
> >       >       > >       >       >       >       >       >       >       
> > On Mon, Sep 7, 2015 at 2:33 PM, dormando wrote: 
> >       >       > >       >       >       >       >       >       >       
> >       Yo, 
> >       >       > >       >       >       >       >       >       >       
> > 
> >       >       > >       >       >       >       >       >       >       
> >       https://github.com/dormando/memcached/commits/slab_rebal_next - 
> would you 
> >       >       > >       >       >       >       >       >       >       
> >       mind playing around with the branch here? You can see the start 
> options in 
> >       >       > >       >       >       >       >       >       >       
> >       the test. 
> >       >       > >       >       >       >       >       >       >       
> > 
> >       >       > >       >       >       >       >       >       >       
> >       This is a dead simple modification (a restoration of a feature that 
> was 
> >       >       > >       >       >       >       >       >       >       
> >       arleady there...). The test very aggressively writes and is able to 
> shunt 
> >       >       > >       >       >       >       >       >       >       
> >       memory around appropriately. 
> >       >       > >       >       >       >       >       >       >       
> > 
> >       >       > >       >       >       >       >       >       >       
> >       The work I'm exploring right now will allow savings of items being 
> >       >       > >       >       >       >       >       >       >       
> >       rebalanced from, and increasing the aggression of page moving 
> without 
> >       >       > >       >       >       >       >       >       >       
> >       being so brain damaged about it. 
> >       >       > >       >       >       >       >       >       >       
> > 
> >       >       > >       >       >       >       >       >       >       
> >       But while I'm poking around with that, I'd be interested in knowing 
> if 
> >       >       > >       >       >       >       >       >       >       
> >       this simple branch is an improvement, and if so how much. 
> >       >       > >       >       >       >       >       >       >       
> > 
> >       >       > >       >       >       >       >       >       >       
> >       I'll push more code to the branch, but the changes should be gated 
> behind 
> >       >       > >       >       >       >       >       >       >       
> >       a feature flag. 
> >       >       > >       >       >       >       >       >       >       
> > 
> >       >       > >       >       >       >       >       >       >       
> >       On Tue, 18 Aug 2015, 'Scott Mansfield' via memcached wrote: 
> >       >       > >       >       >       >       >       >       >       
> > 
> >       >       > >       >       >       >       >       >       >       
> >       > 
> >       >       > >       >       >       >       >       >       >       
> >       > No worries man, you're doing us a favor. Let me know if there's 
> anything you need from us, and I promise I'll be quicker this time :) 
> >       >       > >       >       >       >       >       >       >       
> >       > 
> >       >       > >       >       >       >       >       >       >       
> >       > On Aug 18, 2015 12:01 AM, "dormando" <dorm...@rydia.net> wrote: 
> >       >       > >       >       >       >       >       >       >       
> >       >       Hey, 
> >       >       > >       >       >       >       >       >       >       
> >       > 
> >       >       > >       >       >       >       >       >       >       
> >       >       I'm still really interested in working on this. I'll be 
> taking a careful 
> >       >       > >       >       >       >       >       >       >       
> >       >       look soon I hope. 
> >       >       > >       >       >       >       >       >       >       
> >       > 
> >       >       > >       >       >       >       >       >       >       
> >       >       On Mon, 3 Aug 2015, Scott Mansfield wrote: 
> >       >       > >       >       >       >       >       >       >       
> >       > 
> >       >       > >       >       >       >       >       >       >       
> >       >       > I've tweaked the program slightly, so I'm adding a new 
> version. It prints more stats as it goes and runs a bit faster. 
> >       >       > >       >       >       >       >       >       >       
> >       >       > 
> >       >       > >       >       >       >       >       >       >       
> >       >       > On Monday, August 3, 2015 at 1:20:37 AM UTC-7, Scott 
> Mansfield wrote: 
> >       >       > >       >       >       >       >       >       >       
> >       >       >       Total brain fart on my part. Apparently I had 
> memcached 1.4.13 on my path (who knows how...) Using the actual one that 
> I've built works. Sorry for the confusion... can't believe I didn't realize 
> that before. I'm testing against the compiled one now to see 
> >       how it 
> >       >       behaves. 
> >       >       > >       >       >       >       >       >       >       
> >       >       >       On Monday, August 3, 2015 at 1:15:06 AM UTC-7, 
> Dormando wrote: 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             You sure that's 1.4.24? None of those fail 
> for me :( 
> >       >       > >       >       >       >       >       >       >       
> >       >       > 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             On Mon, 3 Aug 2015, Scott Mansfield wrote: 
> >       >       > >       >       >       >       >       >       >       
> >       >       > 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             > The command line I've used that will start 
> is: 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             > 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             > memcached -m 64 -o 
> slab_reassign,slab_automove 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             > 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             > 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             > the ones that fail are: 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             > 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             > 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             > memcached -m 64 -o 
> slab_reassign,slab_automove,lru_crawler,lru_maintainer 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             > 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             > memcached -o lru_crawler 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             > 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             > 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             > I'm sure I've missed something during 
> compile, though I just used ./configure and make. 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             > 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             > 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             > On Monday, August 3, 2015 at 12:22:33 AM 
> UTC-7, Scott Mansfield wrote: 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             >       I've attached a pretty simple program 
> to connect, fill a slab with data, and then fill another slab slowly with 
> data of a different size. I've been trying to get memcached to run with the 
> lru_crawler and lru_maintainer flags, but I get ' 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             > 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             >       Illegal suboption "(null)"' every 
> time I try to start with either in any configuration. 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             > 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             > 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             >       I haven't seen it start to move slabs 
> automatically with a freshly installed 1.2.24. 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             > 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             > 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             >       On Tuesday, July 21, 2015 at 4:55:17 
> PM UTC-7, Scott Mansfield wrote: 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             >             I realize I've not given you 
> the tests to reproduce the behavior. I should be able to soon. Sorry about 
> the delay here. 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             > In the mean time, I wanted to bring up a 
> possible secondary use of the same logic to move items on slab rebalancing. 
> I think the system might benefit from using the same logic to crawl the 
> pages in a slab and compact the data in the background. In 
> >       the case 
> >       >       where we 
> >       >       > >       have 
> >       >       > >       >       memory that 
> >       >       > >       >       >       is 
> >       >       > >       >       >       >       assigned to 
> >       >       > >       >       >       >       >       the slab 
> >       >       > >       >       >       >       >       >       but not 
> >       >       > >       >       >       >       >       >       >       
> >       being used 
> >       >       > >       >       >       >       >       >       >       
> >       >       because 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             of replaced 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             > or TTL'd out data, returning the memory to 
> a pool of free memory will allow a slab to grow with that memory first 
> instead of waiting for an event where memory is needed at that instant. 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             > 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             > It's a change in approach, from reactive to 
> proactive. What do you think? 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             > 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             > On Monday, July 13, 2015 at 5:54:11 PM 
> UTC-7, Dormando wrote: 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             >       > First, more detail for you: 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             >       > 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             >       > We are running 1.4.24 in production 
> and haven't noticed any bugs as of yet. The new LRUs seem to be working 
> well, though we nearly always run memcached scaled to hold all data without 
> evictions. Those with evictions are behaving well. Those 
> >       without 
> >       >       evictions 
> >       >       > >       haven't 
> >       >       > >       >       seen 
> >       >       > >       >       >       crashing or 
> >       >       > >       >       >       >       any 
> >       >       > >       >       >       >       >       other 
> >       >       > >       >       >       >       >       >       
> noticeable 
> >       >       > >       >       >       >       >       >       >       
> bad 
> >       >       > >       >       >       >       >       >       >       
> >       behavior. 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             > 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             >       Neat. 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             > 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             >       > 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             >       > OK, I think I see an area where I 
> was speculating on functionality. If you have a key in slab 21 and then the 
> same key is written again at a larger size in slab 23 I assumed that the 
> space in 21 was not freed on the second write. With that 
> >       >       assumption, the LRU 
> >       >       > >       crawler 
> >       >       > >       >       would 
> >       >       > >       >       >       not free 
> >       >       > >       >       >       >       up that 
> >       >       > >       >       >       >       >       space. 
> >       >       > >       >       >       >       >       >       Also 
> just 
> >       >       > >       >       >       >       >       >       >       
> >       by observation 
> >       >       > >       >       >       >       >       >       >       
> >       >       in 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             the 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             >       macro, the space is not freed 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             >       > fast enough to be effective, in our 
> use case, to accept the writes that are happening. Think in the hundreds of 
> millions of "overwrites" in a 6 - 10 hour period across a cluster. 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             > 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             >       Internally, "items" (a key/value 
> pair) are generally immutable. The only 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             >       time when it's not is for INCR/DECR, 
> and it still becomes immutable if two 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             >       INCR/DECR's collide. 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             > 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             >       What this means, is that the new item 
> is staged in a piece of free memory 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             >       while the "upload" stage of the SET 
> happens. When memcached has all of the 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             >       data in memory to replace the item, 
> it does an internal swap under a lock. 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             >       The old item is removed from the hash 
> table and LRU, and the new item gets 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             >       put in its place (at the head of the 
> LRU). 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             > 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             >       Since items are refcounted, this 
> means that if other users are downloading 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             >       an item which just got replaced, 
> their memory doesn't get corrupted by the 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             >       item changing out from underneath 
> them. They can continue to read the old 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             >       item until they're done. When the 
> refcount reaches zero the old memory is 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             >       reclaimed. 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             > 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             >       Most of the time, the item 
> replacement happens then the old memory is 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             >       immediately removed. 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             > 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             >       However, this does mean that you need 
> *one* piece of free memory to 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             >       replace the old one. Then the old 
> memory gets freed after that set. 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             > 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             >       So if you take a memcached instance 
> with 0 free chunks, and do a rolling 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             >       replacement of all items (within the 
> same slab class as before), the first 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             >       one would cause an eviction from the 
> tail of the LRU to get a free chunk. 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             >       Every SET after that would use the 
> chunk freed from the replacement of the 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             >       previous memory. 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             > 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             >       > After that last sentence I realized 
> I also may not have explained well enough the access pattern. The keys are 
> all overwritten every day, but it takes some time to write them all 
> (obviously). We see a huge increase in the bytes metric as if 
> >       the new 
> >       >       data for 
> >       >       > >       the old 
> >       >       > >       >       keys was 
> >       >       > >       >       >       being 
> >       >       > >       >       >       >       written 
> >       >       > >       >       >       >       >       for the 
> >       >       > >       >       >       >       >       >       first 
> >       >       > >       >       >       >       >       >       >       
> time. 
> >       >       > >       >       >       >       >       >       >       
> >       Since the 
> >       >       > >       >       >       >       >       >       >       
> >       >       "old" 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             slab for 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             >       the same key doesn't 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             >       > proactively release memory, it 
> starts to fill up the cache and then start evicting data in the new slab. 
> Once that happens, we see evictions in the old slab because of the 
> algorithm you mentioned (random picking / freeing of memory). 
> >       Typically we 
> >       >       don't see 
> >       >       > >       any use 
> >       >       > >       >       for 
> >       >       > >       >       >       "upgrading" an 
> >       >       > >       >       >       >       item as 
> >       >       > >       >       >       >       >       the new 
> >       >       > >       >       >       >       >       >       data 
> >       >       > >       >       >       >       >       >       >       
> >       would be entirely 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             new and 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             >       should wholesale replace the 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             >       > old data for that key. More 
> specifically, the operation is always set, with different data each day. 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             > 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             >       Right. Most of your problems will 
> come from two areas. One being that 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             >       writing data aggressively into the 
> new slab class (unless you set the 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             >       rebalancer to always-replace mode), 
> the mover will make memory available 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             >       more slowly than you can insert. So 
> you'll cause extra evictions in the 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             >       new slab class. 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             > 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             >       The secondary problem is from the 
> random evictions in the previous slab 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             >       class as stuff is chucked on the 
> floor to make memory moveable. 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             > 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             >       > As for testing, we'll be able to 
> put it under real production workload. I don't know what kind of data you 
> mean you need for testing. The data stored in the caches are highly 
> confidential. I can give you all kinds of metrics, since we 
> >       collect most 
> >       >       of the ones 
> >       >       > >       that 
> >       >       > >       >       are in the 
> >       >       > >       >       >       stats 
> >       >       > >       >       >       >       and some 
> >       >       > >       >       >       >       >       from the 
> >       >       > >       >       >       >       >       >       stats 
> >       >       > >       >       >       >       >       >       >       
> >       slabs output. If 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             you have 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             >       some specific ones that 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             >       > need collecting, I'll double check 
> and make sure we can get those. Alternatively, it might be most beneficial 
> to see the metrics in person :) 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             > 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             >       I just need stats snapshots here and 
> there, and actually putting the thing 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             >       under load. When I did the LRU work I 
> had to beg for several months 
> >       >       > >       >       >       >       >       >       >       
> >       >       >             >       before anyone tested it with a 
> production load. This slows things down and 
> >       >       > >       >       >       >       >       >     ...

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Check for orphaned items in lru crawler thread

Reply via email to