Re: memory efficiency / LRU refactor branch

Eric McConville Tue, 20 Jan 2015 11:17:37 -0800

This is more of a comment, but I noticed when debugging w/ running the
lru_maintainer option under extreme verbosity (-vvv), I get an endless
running/sleeping message.


    ~> ./memcached -vvv -o lru_maintainer
    // ... slab start-up ...
    LRU maintainer thread running
    LRU maintainer thread sleeping
    LRU maintainer thread running
    LRU maintainer thread sleeping
    LRU maintainer thread running
    LRU maintainer thread sleeping
    // ... endless...

Expected, but a bit annoying

On Tue, Jan 20, 2015 at 12:37 AM, dormando <[email protected]> wrote:

> Thanks!
>
> No crashes is interesting/useful at least? No errors or other problems?
>
> I'm still hoping someone can side-by-side in production with the
> recommended settings. I can come up with synthetic tests all day and it
> doesn't educate in the same way.
>
> On Tue, 20 Jan 2015, Zhiwei Chan wrote:
>
> > test result:
> >   I run this test last night, the result as following:
> > 1. environment:
> > [root@jason3 code]# lsb_release -a
> > LSB Version:
> >
> :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
> > Distributor ID: CentOS
> > Description: CentOS release 6.5 (Final)
> > Release: 6.5
> > Codename: Final
> > [root@jason3 code]# free
> >              total       used       free     shared    buffers     cached
> > Mem:       8003888    3434536    4569352          0     263324    1372600
> > -/+ buffers/cache:    1798612    6205276
> > Swap:      8142840      11596    8131244
> > [root@jason3 code]# cat /proc/cpuinfo
> > processor : 0
> > vendor_id : GenuineIntel
> > cpu family : 6
> > model : 58
> > model name : Intel(R) Xeon(R) CPU E3-1225 V2 @ 3.20GHz
> > stepping : 9
> > cpu MHz : 1600.000
> > cache size : 8192 KB
> > .... 4 core.
> >
> > 2. running option:
> > [root@jason3 code]# ps -ef|grep memcached-
> > root      7898     1 11 Jan19 ?        02:12:46 ./memcached-master -c
> 10240 -o tail_repair_time=7200 -m 64 -u root -p 33333 -d
> > root      8092     1 11 Jan19 ?        02:11:22 ./memcached-lrurework -d
> -c 10240 -o lru_maintainer lru_crawler -m 64 -u root -p 44444
> > root     10265  9447  0 11:30 pts/1    00:00:00 grep memcached-
> > root     10325     1 11 Jan19 ?        02:06:14 ./memcached-release -d
> -c 10240 -m 64 -u root -p 55555 -o slab_reassign lru_crawler slab_automove=3
> > release_mem_sleep=1 release_mem_start=40 release_mem_stop=80
> lru_crawler_interval=3600
> >
> > memcached-master : the most update memcached of master branch. with port
> 33333
> > memcached-lrurework: the most update lrurework branch of dormado's
> memcached, with port 44444
> > memcached-release: the most update master branch + release memory path.
> with port 55555
> >
> > 3. What is the traffic mode?
> >   It simulates the traffic distribution of one of our pools, with the
> expire-time and value-length distribution as following:
> > #the expire of keys
> > expire_time         = [1,5,10,30,60,300,600,3600,86400,0]
> > expire_time_weight  = [1,1, 2, 5, 8,  5,  6,   5,    3,1]
> >
> > #the len of value
> > value_len         = [4,10,50,100,200,500,1000,2000,5000,10000]
> > value_len_weight  = [3, 4, 5,  8,  8, 10,   5,   5,   2,    1]
> >
> > Using the the python script "compare_test.py"to excute: python
> ./compare_test.py 192.168.116.213:33333,192.168.116.213:44444,
> 192.168.116.213:55555
> >
> > I run the test process on the machine that run memcached process, so
> that it is easy to get heavy workload.
> >
> > I got a test result of last 12 hours, watch at Cacti. it seems that
> there is no different for this traffic mode.
> > gets/sets = 9:1
> > hit_rate ~ 50%
> > [IMAGE]
> > I also print some detail statistics info in the test script:
> >
> > Cache list: ['192.168.116.213:33333', '192.168.116.213:44444', '
> 192.168.116.213:55555']
> > send_key_number: 127306   ------->unique keys number
> > test_loop: 0   --->loop forever, no limit
> > weight of get/set command: [10, 1]   ----------> the weight of get/set
> command. Note: if get a key miss, it will set the key immediately, not count
> > into this weight.
> > show_interval: 10    ---the interval of showing statistics info.
> > stats_interval: 5      ---the interval of getting the stats of memcached.
> > show_stats_interval:[60, 3600, 43200]   -----------the time-range of
> showing in second. e.g. "60" means "last 60s", and 3600 means "last 3600s"
> > len of keys: [4, 10, 50, 100, 200, 500, 1000, 2000, 5000, 10000]
>  -------->possible length of keys we will set to memcached.
> > weight of keys'len: [3, 4, 5,  8,  8, 10,   5,   5,   2,    1]
> ------->weight of different length of value.
> > expire-time of keys: [1, 5, 10, 30, 60, 300, 600, 3600, 86400, 0]
>  --->possible expire-time we used in set command. Independent with the
> length
> > of value.
> > weight of keys'expire-time: [1, 1, 2, 5, 8, 5, 6, 5, 3, 1]  -->weight of
> different expire-time.
> > ...
> >
> >  #28190284 command: 281902842    -------->the first number has no
> meaning; the second number is the command number we send to memcached.
> >
> > All the following number is recorded as increment, except the second
> number of the items.
> >   192.168.116.213:33333
> > [60s] gets:   523063, hit:  49%, updates:    52141, dels:        0,
> items:   -8/69423, read: 53891331, write: 215106364, OOMs:   0, evict:  6626
> > [3600s] gets: 29664649, hit:  49%, updates:  2966798, dels:        0,
> items:   13/69423, read: 3038408576, write: 12218798832, OOMs:   0, evict:
> > 356323
> >   192.168.116.213:44444
> > [60s] gets:   523007, hit:  50%, updates:    52202, dels:        0,
> items:  -62/69348, read: 53528995, write: 218847446, OOMs:   0, evict:  6539
> > [3600s] gets: 29667232, hit:  50%, updates:  2964220, dels:        0,
> items:  -14/69348, read: 3030860658, write: 12405356058, OOMs:   0, evict:
> > 359460
> >   192.168.116.213:55555
> > [60s] gets:   523093, hit:  49%, updates:    52116, dels:        0,
> items:   28/69396, read: 52993446, write: 215231210, OOMs:   0, evict:  6491
> > [3600s] gets: 29669464, hit:  49%, updates:  2961988, dels:        0,
> items:  -25/69396, read: 3038356827, write: 12219764097, OOMs:   0, evict:
> > 355644
> > ...
> >
> >
> >
> > On Fri, Jan 16, 2015 at 9:29 PM, Zhiwei Chan <[email protected]>
> wrote:
> >         Our maintain team trend to be conservative, especially on the
> basic software relative to performance. so I think it is rare possible
> >       to post it to the production recently. But I write a pretty
> convenient tools in Python for an A/B test. The tool can fake traffic of
> >       random expire-time and random length, and also can specify the
> weights of different expire-time and length, and lots of other
> >       functions. It is almost completed, and I can post a result next
> Monday.
> >
> > On Fri, Jan 16, 2015 at 11:12 AM, dormando <[email protected]> wrote:
> >       If you want?
> >
> >       What would make you confident enough to try the branch in
> production? Or
> >       do you rely on your other patches and that's not really possible?
> >
> >       On Thu, 15 Jan 2015, Zhiwei Chan wrote:
> >
> >       >   I try to use real traffic of application to make a compare
> test, but it seems that not all of guys use the cache-client with
> >       consistent hash in
> >       > dev environment. The result is that the traffic is not
> distributed well as I supposed.
> >       >   Should I fake the traffic and make a compare test instead of
> real traffic?  e.g., fake the random expire-time keys traffic to
> >       set and get for
> >       > memcached.
> >       >
> >       > ---------------
> >       > host mc56 installs the most update LRU-rework branch's memcached
> with option likes "/usr/local/bin/memcached -u nobody -d -c
> >       10240 -o
> >       > lru_maintainer lru_crawler -m 64 -p 11811";
> >       > host mc57 install the version 1.4.20_7_gb118a6c's memcached,
> with option likes "/usr/bin/memcached -u nobody -d -c 10240 -o
> >       tail_repair_time=7200
> >       > -m 64 -p 11811",
> >       >
> >       > I sum up the stats of all  memcache instances on the host and
> make followings analysis:
> >       >
> >       > Inline image 1
> >       >
> >       > On Wed, Jan 14, 2015 at 1:58 AM, dormando <[email protected]>
> wrote:
> >       >       Last update to the branch was 3 days ago. I'm not planning
> on doing any
> >       >       more work on it at the moment, so people have a chance to
> test it.
> >       >
> >       >       thanks!
> >       >
> >       >       On Tue, 13 Jan 2015, Zhiwei Chan wrote:
> >       >
> >       >       > I compile directly using your branch on the test server,
> and please tell me if it need update and re-compile.
> >       >       >
> >       >       > On Tue, Jan 13, 2015 at 4:20 AM, dormando <
> [email protected]> wrote:
> >       >       >       That sounds like an okay place to start. Can you
> please make sure the
> >       >       >       other dev server is running the very latest
> version of the branch? A lot
> >       >       >       changed since last friday... a few pretty bad bugs.
> >       >       >
> >       >       >       Please use the startup options described in the
> middle of the PR.
> >       >       >
> >       >       >       If anyone's brave enough to try the latest branch
> on one production
> >       >       >       instance (if they have a low traffic one
> somewhere, maybe?) that'd be
> >       >       >       good. I ran the branch under a load tester for a
> few hours, it passes
> >       >       >       tests, etc. If I merge it, it'll just go into
> people's productions without
> >       >       >       ever having a production test first, so hopefully
> someone can try it?
> >       >       >
> >       >       >       thanks
> >       >       >
> >       >       >       On Mon, 12 Jan 2015, Zhiwei Chan wrote:
> >       >       >
> >       >       >       >   I have run it since last Friday, so far no
> crash. As I have finished the haproxy works today, I will try a
> >       compare test for
> >       >       this
> >       >       >       LRU works
> >       >       >       > tomorrow as following:    There are two
> servers(Centos 5.8, 8cores, 8G memory) in the dev environment, Both of
> >       server run 32
> >       >       >       memcached
> >       >       >       > instances(processes) with maxmum memory of 128M.
> One server runs version 1.4.21, the other runs this branch.
> >       There are lots
> >       >       of
> >       >       >       "pools" using these
> >       >       >       > memcached server, and all of pools use tow
> memcached instances on different server. The client of pools use
> >       Consistent Hash
> >       >       algorithm
> >       >       >       to distribute
> >       >       >       > keys to their 2 memcached instances. I will
> watch the hit-rate and other performance using Cacti.
> >       >       >       >   I think it will work, but usually there is not
> much traffic in our dev environment.  Please tell me if any
> >       other advice.
> >       >       >       >
> >       >       >       >
> >       >       >       > 2015-01-08 4:21 GMT+08:00 dormando <
> [email protected]>:
> >       >       >       >       Hey,
> >       >       >       >
> >       >       >       >       To all three of you: Just run it anywhere
> you can (but not more than one
> >       >       >       >       machine, yet?), with the options
> prescribed in the PR. Ideally you have
> >       >       >       >       graphs of the hit ratio and maybe cache
> fullness and can compare
> >       >       >       >       before/after.
> >       >       >       >
> >       >       >       >       And let me know if it hangs or crashes,
> obviously. If so a backtrace
> >       >       >       >       and/or coredump would be fantastic.
> >       >       >       >
> >       >       >       >       On Thu, 8 Jan 2015, Zhiwei Chan wrote:
> >       >       >       >
> >       >       >       >       >   I will deploy it to one of our test
> environment on CentOS 5.8, for a comparison test with the 1.4.21,
> >        although the
> >       >       >       workloads is
> >       >       >       >       not as heavy as
> >       >       >       >       > product environment. Tell me if any I
> could help.
> >       >       >       >       >
> >       >       >       >       > 2015-01-07 23:30 GMT+08:00 Eric
> McConville <[email protected]>:
> >       >       >       >       >       Same here. Do you want any
> findings posted to the mailing list, or the PU thread?
> >       >       >       >       >
> >       >       >       >       > On Wed, Jan 7, 2015 at 5:56 AM, Ryan
> McCullagh <[email protected]> wrote:
> >       >       >       >       >       I'm willing to help out in any way
> possible. What can I do?
> >       >       >       >       >
> >       >       >       >       >       -----Original Message-----
> >       >       >       >       >       From: [email protected]
> [mailto:[email protected]] On
> >       >       >       >       >       Behalf Of dormando
> >       >       >       >       >       Sent: Wednesday, January 7, 2015
> 3:52 AM
> >       >       >       >       >       To: [email protected]
> >       >       >       >       >       Subject: memory efficiency / LRU
> refactor branch
> >       >       >       >       >
> >       >       >       >       >       Yo,
> >       >       >       >       >
> >       >       >       >       >
> https://github.com/memcached/memcached/pull/97
> >       >       >       >       >
> >       >       >       >       >       Opening to a wider audience. I
> need some folks willing to poke at it and see
> >       >       >       >       >       if their workloads fair better or
> worse with respect to hit ratios.
> >       >       >       >       >
> >       >       >       >       >       The rest of the work remaining on
> my end is more testing, and some TODO's
> >       >       >       >       >       noted in the PR. The remaining
> work is relatively small aside from the page
> >       >       >       >       >       mover idea. It hasn't been
> crashing or hanging in my testing so far, but
> >       >       >       >       >       that might still happen.
> >       >       >       >       >
> >       >       >       >       >       I can't/won't merge this until I
> get some evidence that it's useful.
> >       >       >       >       >       Hoping someone out there can lend
> a hand. I don't know what the actual
> >       >       >       >       >       impact would be, but for some
> workloads it could be large. Even for folks
> >       >       >       >       >       who have set all items to never
> expire, it could still potentially improve
> >       >       >       >       >       hit ratios by better protecting
> active items.
> >       >       >       >       >
> >       >       >       >       >       It will work best if you at least
> have a mix of items with TTL's that expire
> >       >       >       >       >       in reasonable amounts of time.
> >       >       >       >       >
> >       >       >       >       >       thanks,
> >       >       >       >       >       -Dormando
> >       >       >       >       >
> >       >       >       >       > --
> >       >       >       >       >
> >       >       >       >       > ---
> >       >       >       >       > You received this message because you
> are subscribed to the Google Groups "memcached" group.
> >       >       >       >       > To unsubscribe from this group and stop
> receiving emails from it, send an email to
> >       >       [email protected].
> >       >       >       >       > For more options, visit
> https://groups.google.com/d/optout.
> >       >       >       >       >
> >       >       >       >       >
> >       >       >       >       > --
> >       >       >       >       >
> >       >       >       >       > ---
> >       >       >       >       > You received this message because you
> are subscribed to the Google Groups "memcached" group.
> >       >       >       >       > To unsubscribe from this group and stop
> receiving emails from it, send an email to
> >       >       [email protected].
> >       >       >       >       > For more options, visit
> https://groups.google.com/d/optout.
> >       >       >       >       >
> >       >       >       >       >
> >       >       >       >       > --
> >       >       >       >       >
> >       >       >       >       > ---
> >       >       >       >       > You received this message because you
> are subscribed to the Google Groups "memcached" group.
> >       >       >       >       > To unsubscribe from this group and stop
> receiving emails from it, send an email to
> >       >       [email protected].
> >       >       >       >       > For more options, visit
> https://groups.google.com/d/optout.
> >       >       >       >       >
> >       >       >       >       >
> >       >       >       >
> >       >       >       >
> >       >       >       > --
> >       >       >       >
> >       >       >       > ---
> >       >       >       > You received this message because you are
> subscribed to the Google Groups "memcached" group.
> >       >       >       > To unsubscribe from this group and stop
> receiving emails from it, send an email to
> >       [email protected].
> >       >       >       > For more options, visit
> https://groups.google.com/d/optout.
> >       >       >       >
> >       >       >       >
> >       >       >
> >       >       >
> >       >       > --
> >       >       >
> >       >       > ---
> >       >       > You received this message because you are subscribed to
> the Google Groups "memcached" group.
> >       >       > To unsubscribe from this group and stop receiving emails
> from it, send an email to
> >       [email protected].
> >       >       > For more options, visit
> https://groups.google.com/d/optout.
> >       >       >
> >       >       >
> >       >
> >       >
> >       > --
> >       >
> >       > ---
> >       > You received this message because you are subscribed to the
> Google Groups "memcached" group.
> >       > To unsubscribe from this group and stop receiving emails from
> it, send an email to [email protected].
> >       > For more options, visit https://groups.google.com/d/optout.
> >       >
> >       >
> >
> >
> >
> > --
> >
> > ---
> > You received this message because you are subscribed to the Google
> Groups "memcached" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> an email to [email protected].
> > For more options, visit https://groups.google.com/d/optout.
> >
> >
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: memory efficiency / LRU refactor branch

Reply via email to