Re: memory efficiency / LRU refactor branch

dormando Tue, 20 Jan 2015 11:26:50 -0800

Can probably get rid of that since I added the "juggles" stat. and/or
rename it to maintainer_runs or something... was useful to see if I'd hung
the thread.


On Tue, 20 Jan 2015, Eric McConville wrote:

> This is more of a comment, but I noticed when debugging w/ running the 
> lru_maintainer option under extreme verbosity (-vvv), I get an endless
> running/sleeping message.
>
>     ~> ./memcached -vvv -o lru_maintainer
>     // ... slab start-up ...
>     LRU maintainer thread running
>     LRU maintainer thread sleeping
>     LRU maintainer thread running
>     LRU maintainer thread sleeping
>     LRU maintainer thread running
>     LRU maintainer thread sleeping
>     // ... endless...
>
> Expected, but a bit annoying
>
> On Tue, Jan 20, 2015 at 12:37 AM, dormando <[email protected]> wrote:
>       Thanks!
>
>       No crashes is interesting/useful at least? No errors or other problems?
>
>       I'm still hoping someone can side-by-side in production with the
>       recommended settings. I can come up with synthetic tests all day and it
>       doesn't educate in the same way.
>
>       On Tue, 20 Jan 2015, Zhiwei Chan wrote:
>
>       > test result:
>       >   I run this test last night, the result as following:
>       > 1. environment:
>       > [root@jason3 code]# lsb_release -a
>       > LSB Version:
>       >
>       
> :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
>       > Distributor ID: CentOS
>       > Description: CentOS release 6.5 (Final)
>       > Release: 6.5
>       > Codename: Final
>       > [root@jason3 code]# free
>       >              total       used       free     shared    buffers     
> cached
>       > Mem:       8003888    3434536    4569352          0     263324    
> 1372600
>       > -/+ buffers/cache:    1798612    6205276
>       > Swap:      8142840      11596    8131244
>       > [root@jason3 code]# cat /proc/cpuinfo 
>       > processor : 0
>       > vendor_id : GenuineIntel
>       > cpu family : 6
>       > model : 58
>       > model name : Intel(R) Xeon(R) CPU E3-1225 V2 @ 3.20GHz
>       > stepping : 9
>       > cpu MHz : 1600.000
>       > cache size : 8192 KB
>       > .... 4 core.
>       >
>       > 2. running option:
>       > [root@jason3 code]# ps -ef|grep memcached-
>       > root      7898     1 11 Jan19 ?        02:12:46 ./memcached-master -c 
> 10240 -o tail_repair_time=7200 -m 64 -u root -p 33333 -d
>       > root      8092     1 11 Jan19 ?        02:11:22 ./memcached-lrurework 
> -d -c 10240 -o lru_maintainer lru_crawler -m 64 -u root -p
>       44444
>       > root     10265  9447  0 11:30 pts/1    00:00:00 grep memcached-
>       > root     10325     1 11 Jan19 ?        02:06:14 ./memcached-release 
> -d -c 10240 -m 64 -u root -p 55555 -o slab_reassign lru_crawler
>       slab_automove=3
>       > release_mem_sleep=1 release_mem_start=40 release_mem_stop=80 
> lru_crawler_interval=3600
>       >   
>       > memcached-master : the most update memcached of master branch. with 
> port 33333
>       > memcached-lrurework: the most update lrurework branch of dormado's 
> memcached, with port 44444
>       > memcached-release: the most update master branch + release memory 
> path. with port 55555
>       >
>       > 3. What is the traffic mode?
>       >   It simulates the traffic distribution of one of our pools, with the 
> expire-time and value-length distribution as following:
>       > #the expire of keys
>       > expire_time         = [1,5,10,30,60,300,600,3600,86400,0]
>       > expire_time_weight  = [1,1, 2, 5, 8,  5,  6,   5,    3,1]
>       >
>       > #the len of value
>       > value_len         = [4,10,50,100,200,500,1000,2000,5000,10000]
>       > value_len_weight  = [3, 4, 5,  8,  8, 10,   5,   5,   2,    1]
>       >
>       > Using the the python script "compare_test.py"to excute: python 
> ./compare_test.py
>       192.168.116.213:33333,192.168.116.213:44444,192.168.116.213:55555
>       >
>       > I run the test process on the machine that run memcached process, so 
> that it is easy to get heavy workload.
>       >
>       > I got a test result of last 12 hours, watch at Cacti. it seems that 
> there is no different for this traffic mode.
>       > gets/sets = 9:1
>       > hit_rate ~ 50%
> > [IMAGE]
> > I also print some detail statistics info in the test script:
> >
> > Cache list: ['192.168.116.213:33333', '192.168.116.213:44444', 
> > '192.168.116.213:55555']
> > send_key_number: 127306   ------->unique keys number
> > test_loop: 0   --->loop forever, no limit
> > weight of get/set command: [10, 1]   ----------> the weight of get/set 
> > command. Note: if get a key miss, it will set the key immediately,
> not count
> > into this weight.
> > show_interval: 10    ---the interval of showing statistics info.
> > stats_interval: 5      ---the interval of getting the stats of memcached.
> > show_stats_interval:[60, 3600, 43200]   -----------the time-range of 
> > showing in second. e.g. "60" means "last 60s", and 3600 means "last
> 3600s"
> > len of keys: [4, 10, 50, 100, 200, 500, 1000, 2000, 5000, 10000]  
> > -------->possible length of keys we will set to memcached.
> > weight of keys'len: [3, 4, 5,  8,  8, 10,   5,   5,   2,    1]   
> > ------->weight of different length of value.
> > expire-time of keys: [1, 5, 10, 30, 60, 300, 600, 3600, 86400, 0]    
> > --->possible expire-time we used in set command. Independent with the
> length
> > of value.
> > weight of keys'expire-time: [1, 1, 2, 5, 8, 5, 6, 5, 3, 1]  -->weight of 
> > different expire-time.
> > ...
> >
> >  #28190284 command: 281902842    -------->the first number has no meaning; 
> > the second number is the command number we send to memcached.
> >
> > All the following number is recorded as increment, except the second number 
> > of the items.
> >   192.168.116.213:33333
> > [60s] gets:   523063, hit:  49%, updates:    52141, dels:        0, items:  
> >  -8/69423, read: 53891331, write: 215106364, OOMs:   0, evict:
>  6626
> > [3600s] gets: 29664649, hit:  49%, updates:  2966798, dels:        0, 
> > items:   13/69423, read: 3038408576, write: 12218798832, OOMs:   0,
> evict:
> > 356323
> >   192.168.116.213:44444
> > [60s] gets:   523007, hit:  50%, updates:    52202, dels:        0, items:  
> > -62/69348, read: 53528995, write: 218847446, OOMs:   0, evict:
>  6539
> > [3600s] gets: 29667232, hit:  50%, updates:  2964220, dels:        0, 
> > items:  -14/69348, read: 3030860658, write: 12405356058, OOMs:   0,
> evict:
> > 359460
> >   192.168.116.213:55555
> > [60s] gets:   523093, hit:  49%, updates:    52116, dels:        0, items:  
> >  28/69396, read: 52993446, write: 215231210, OOMs:   0, evict:
>  6491
> > [3600s] gets: 29669464, hit:  49%, updates:  2961988, dels:        0, 
> > items:  -25/69396, read: 3038356827, write: 12219764097, OOMs:   0,
> evict:
> > 355644
> > ...
> >
> >
> >
> > On Fri, Jan 16, 2015 at 9:29 PM, Zhiwei Chan <[email protected]> 
> > wrote:
> >         Our maintain team trend to be conservative, especially on the basic 
> >software relative to performance. so I think it is rare
> possible
> >       to post it to the production recently. But I write a pretty 
> >convenient tools in Python for an A/B test. The tool can fake traffic of
> >       random expire-time and random length, and also can specify the 
> >weights of different expire-time and length, and lots of other
> >       functions. It is almost completed, and I can post a result next 
> >Monday. 
> >
> > On Fri, Jan 16, 2015 at 11:12 AM, dormando <[email protected]> wrote:
> >       If you want?
> >
> >       What would make you confident enough to try the branch in production? 
> >Or
> >       do you rely on your other patches and that's not really possible?
> >
> >       On Thu, 15 Jan 2015, Zhiwei Chan wrote:
> >
> >       >   I try to use real traffic of application to make a compare test, 
> >but it seems that not all of guys use the cache-client with
> >       consistent hash in
> >       > dev environment. The result is that the traffic is not distributed 
> >well as I supposed. 
> >       >   Should I fake the traffic and make a compare test instead of real 
> >traffic?  e.g., fake the random expire-time keys traffic to
> >       set and get for
> >       > memcached.
> >       >
> >       > ---------------
> >       > host mc56 installs the most update LRU-rework branch's memcached 
> >with option likes "/usr/local/bin/memcached -u nobody -d -c
> >       10240 -o
> >       > lru_maintainer lru_crawler -m 64 -p 11811";
> >       > host mc57 install the version 1.4.20_7_gb118a6c's memcached, with 
> >option likes "/usr/bin/memcached -u nobody -d -c 10240 -o
> >       tail_repair_time=7200
> >       > -m 64 -p 11811",
> >       >
> >       > I sum up the stats of all  memcache instances on the host and make 
> >followings analysis: 
> >       >
> >       > Inline image 1
> >       >
> >       > On Wed, Jan 14, 2015 at 1:58 AM, dormando <[email protected]> 
> >wrote:
> >       >       Last update to the branch was 3 days ago. I'm not planning on 
> >doing any
> >       >       more work on it at the moment, so people have a chance to 
> >test it.
> >       >
> >       >       thanks!
> >       >
> >       >       On Tue, 13 Jan 2015, Zhiwei Chan wrote:
> >       >
> >       >       > I compile directly using your branch on the test server, 
> >and please tell me if it need update and re-compile.
> >       >       >
> >       >       > On Tue, Jan 13, 2015 at 4:20 AM, dormando 
> ><[email protected]> wrote:
> >       >       >       That sounds like an okay place to start. Can you 
> >please make sure the
> >       >       >       other dev server is running the very latest version 
> >of the branch? A lot
> >       >       >       changed since last friday... a few pretty bad bugs.
> >       >       >
> >       >       >       Please use the startup options described in the 
> >middle of the PR.
> >       >       >
> >       >       >       If anyone's brave enough to try the latest branch on 
> >one production
> >       >       >       instance (if they have a low traffic one somewhere, 
> >maybe?) that'd be
> >       >       >       good. I ran the branch under a load tester for a few 
> >hours, it passes
> >       >       >       tests, etc. If I merge it, it'll just go into 
> >people's productions without
> >       >       >       ever having a production test first, so hopefully 
> >someone can try it?
> >       >       >
> >       >       >       thanks
> >       >       >
> >       >       >       On Mon, 12 Jan 2015, Zhiwei Chan wrote:
> >       >       >
> >       >       >       >   I have run it since last Friday, so far no crash. 
> >As I have finished the haproxy works today, I will try a
> >       compare test for
> >       >       this
> >       >       >       LRU works
> >       >       >       > tomorrow as following:    There are two 
> >servers(Centos 5.8, 8cores, 8G memory) in the dev environment, Both of
> >       server run 32
> >       >       >       memcached
> >       >       >       > instances(processes) with maxmum memory of 128M. 
> >One server runs version 1.4.21, the other runs this branch.
> >       There are lots
> >       >       of
> >       >       >       "pools" using these
> >       >       >       > memcached server, and all of pools use tow 
> >memcached instances on different server. The client of pools use
> >       Consistent Hash
> >       >       algorithm
> >       >       >       to distribute
> >       >       >       > keys to their 2 memcached instances. I will watch 
> >the hit-rate and other performance using Cacti.
> >       >       >       >   I think it will work, but usually there is not 
> >much traffic in our dev environment.  Please tell me if any
> >       other advice.
> >       >       >       >   
> >       >       >       >
> >       >       >       > 2015-01-08 4:21 GMT+08:00 dormando 
> ><[email protected]>:
> >       >       >       >       Hey,
> >       >       >       >
> >       >       >       >       To all three of you: Just run it anywhere you 
> >can (but not more than one
> >       >       >       >       machine, yet?), with the options prescribed 
> >in the PR. Ideally you have
> >       >       >       >       graphs of the hit ratio and maybe cache 
> >fullness and can compare
> >       >       >       >       before/after.
> >       >       >       >
> >       >       >       >       And let me know if it hangs or crashes, 
> >obviously. If so a backtrace
> >       >       >       >       and/or coredump would be fantastic.
> >       >       >       >
> >       >       >       >       On Thu, 8 Jan 2015, Zhiwei Chan wrote:
> >       >       >       >
> >       >       >       >       >   I will deploy it to one of our test 
> >environment on CentOS 5.8, for a comparison test with the 1.4.21,
> >        although the
> >       >       >       workloads is
> >       >       >       >       not as heavy as
> >       >       >       >       > product environment. Tell me if any I could 
> >help.
> >       >       >       >       >
> >       >       >       >       > 2015-01-07 23:30 GMT+08:00 Eric McConville 
> ><[email protected]>:
> >       >       >       >       >       Same here. Do you want any findings 
> >posted to the mailing list, or the PU thread?
> >       >       >       >       >
> >       >       >       >       > On Wed, Jan 7, 2015 at 5:56 AM, Ryan 
> >McCullagh <[email protected]> wrote:
> >       >       >       >       >       I'm willing to help out in any way 
> >possible. What can I do?
> >       >       >       >       >
> >       >       >       >       >       -----Original Message-----
> >       >       >       >       >       From: [email protected] 
> >[mailto:[email protected]] On
> >       >       >       >       >       Behalf Of dormando
> >       >       >       >       >       Sent: Wednesday, January 7, 2015 3:52 
> >AM
> >       >       >       >       >       To: [email protected]
> >       >       >       >       >       Subject: memory efficiency / LRU 
> >refactor branch
> >       >       >       >       >
> >       >       >       >       >       Yo,
> >       >       >       >       >
> >       >       >       >       >       
> >https://github.com/memcached/memcached/pull/97
> >       >       >       >       >
> >       >       >       >       >       Opening to a wider audience. I need 
> >some folks willing to poke at it and see
> >       >       >       >       >       if their workloads fair better or 
> >worse with respect to hit ratios.
> >       >       >       >       >
> >       >       >       >       >       The rest of the work remaining on my 
> >end is more testing, and some TODO's
> >       >       >       >       >       noted in the PR. The remaining work 
> >is relatively small aside from the page
> >       >       >       >       >       mover idea. It hasn't been crashing 
> >or hanging in my testing so far, but
> >       >       >       >       >       that might still happen.
> >       >       >       >       >
> >       >       >       >       >       I can't/won't merge this until I get 
> >some evidence that it's useful.
> >       >       >       >       >       Hoping someone out there can lend a 
> >hand. I don't know what the actual
> >       >       >       >       >       impact would be, but for some 
> >workloads it could be large. Even for folks
> >       >       >       >       >       who have set all items to never 
> >expire, it could still potentially improve
> >       >       >       >       >       hit ratios by better protecting 
> >active items.
> >       >       >       >       >
> >       >       >       >       >       It will work best if you at least 
> >have a mix of items with TTL's that expire
> >       >       >       >       >       in reasonable amounts of time.
> >       >       >       >       >
> >       >       >       >       >       thanks,
> >       >       >       >       >       -Dormando
> >       >       >       >       >
> >       >       >       >       > --
> >       >       >       >       >
> >       >       >       >       > ---
> >       >       >       >       > You received this message because you are 
> >subscribed to the Google Groups "memcached" group.
> >       >       >       >       > To unsubscribe from this group and stop 
> >receiving emails from it, send an email to
> >       >       [email protected].
> >       >       >       >       > For more options, visit 
> >https://groups.google.com/d/optout.
> >       >       >       >       >
> >       >       >       >       >
> >       >       >       >       > --
> >       >       >       >       >
> >       >       >       >       > ---
> >       >       >       >       > You received this message because you are 
> >subscribed to the Google Groups "memcached" group.
> >       >       >       >       > To unsubscribe from this group and stop 
> >receiving emails from it, send an email to
> >       >       [email protected].
> >       >       >       >       > For more options, visit 
> >https://groups.google.com/d/optout.
> >       >       >       >       >
> >       >       >       >       >
> >       >       >       >       > --
> >       >       >       >       >
> >       >       >       >       > ---
> >       >       >       >       > You received this message because you are 
> >subscribed to the Google Groups "memcached" group.
> >       >       >       >       > To unsubscribe from this group and stop 
> >receiving emails from it, send an email to
> >       >       [email protected].
> >       >       >       >       > For more options, visit 
> >https://groups.google.com/d/optout.
> >       >       >       >       >
> >       >       >       >       >
> >       >       >       >
> >       >       >       >
> >       >       >       > --
> >       >       >       >
> >       >       >       > ---
> >       >       >       > You received this message because you are 
> >subscribed to the Google Groups "memcached" group.
> >       >       >       > To unsubscribe from this group and stop receiving 
> >emails from it, send an email to
> >       [email protected].
> >       >       >       > For more options, visit 
> >https://groups.google.com/d/optout.
> >       >       >       >
> >       >       >       >
> >       >       >
> >       >       >
> >       >       > --
> >       >       >
> >       >       > ---
> >       >       > You received this message because you are subscribed to the 
> >Google Groups "memcached" group.
> >       >       > To unsubscribe from this group and stop receiving emails 
> >from it, send an email to
> >       [email protected].
> >       >       > For more options, visit https://groups.google.com/d/optout.
> >       >       >
> >       >       >
> >       >
> >       >
> >       > --
> >       >
> >       > ---
> >       > You received this message because you are subscribed to the Google 
> >Groups "memcached" group.
> >       > To unsubscribe from this group and stop receiving emails from it, 
> >send an email to [email protected].
> >       > For more options, visit https://groups.google.com/d/optout.
> >       >
> >       >
> >
> >
> >
> > --
> >
> > ---
> > You received this message because you are subscribed to the Google Groups 
> > "memcached" group.
> > To unsubscribe from this group and stop receiving emails from it, send an 
> > email to [email protected].
> > For more options, visit https://groups.google.com/d/optout.
> >
> >
>
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>
>

Re: memory efficiency / LRU refactor branch

Reply via email to