This is more of a comment, but I noticed when debugging w/ running the
lru_maintainer option under extreme verbosity (-vvv), I get an endless
running/sleeping message.
~> ./memcached -vvv -o lru_maintainer
// ... slab start-up ...
LRU maintainer thread running
LRU maintainer thread sleeping
LRU maintainer thread running
LRU maintainer thread sleeping
LRU maintainer thread running
LRU maintainer thread sleeping
// ... endless...
Expected, but a bit annoying
On Tue, Jan 20, 2015 at 12:37 AM, dormando <[email protected]> wrote:
> Thanks!
>
> No crashes is interesting/useful at least? No errors or other problems?
>
> I'm still hoping someone can side-by-side in production with the
> recommended settings. I can come up with synthetic tests all day and it
> doesn't educate in the same way.
>
> On Tue, 20 Jan 2015, Zhiwei Chan wrote:
>
> > test result:
> > I run this test last night, the result as following:
> > 1. environment:
> > [root@jason3 code]# lsb_release -a
> > LSB Version:
> >
> :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
> > Distributor ID: CentOS
> > Description: CentOS release 6.5 (Final)
> > Release: 6.5
> > Codename: Final
> > [root@jason3 code]# free
> > total used free shared buffers cached
> > Mem: 8003888 3434536 4569352 0 263324 1372600
> > -/+ buffers/cache: 1798612 6205276
> > Swap: 8142840 11596 8131244
> > [root@jason3 code]# cat /proc/cpuinfo
> > processor : 0
> > vendor_id : GenuineIntel
> > cpu family : 6
> > model : 58
> > model name : Intel(R) Xeon(R) CPU E3-1225 V2 @ 3.20GHz
> > stepping : 9
> > cpu MHz : 1600.000
> > cache size : 8192 KB
> > .... 4 core.
> >
> > 2. running option:
> > [root@jason3 code]# ps -ef|grep memcached-
> > root 7898 1 11 Jan19 ? 02:12:46 ./memcached-master -c
> 10240 -o tail_repair_time=7200 -m 64 -u root -p 33333 -d
> > root 8092 1 11 Jan19 ? 02:11:22 ./memcached-lrurework -d
> -c 10240 -o lru_maintainer lru_crawler -m 64 -u root -p 44444
> > root 10265 9447 0 11:30 pts/1 00:00:00 grep memcached-
> > root 10325 1 11 Jan19 ? 02:06:14 ./memcached-release -d
> -c 10240 -m 64 -u root -p 55555 -o slab_reassign lru_crawler slab_automove=3
> > release_mem_sleep=1 release_mem_start=40 release_mem_stop=80
> lru_crawler_interval=3600
> >
> > memcached-master : the most update memcached of master branch. with port
> 33333
> > memcached-lrurework: the most update lrurework branch of dormado's
> memcached, with port 44444
> > memcached-release: the most update master branch + release memory path.
> with port 55555
> >
> > 3. What is the traffic mode?
> > It simulates the traffic distribution of one of our pools, with the
> expire-time and value-length distribution as following:
> > #the expire of keys
> > expire_time = [1,5,10,30,60,300,600,3600,86400,0]
> > expire_time_weight = [1,1, 2, 5, 8, 5, 6, 5, 3,1]
> >
> > #the len of value
> > value_len = [4,10,50,100,200,500,1000,2000,5000,10000]
> > value_len_weight = [3, 4, 5, 8, 8, 10, 5, 5, 2, 1]
> >
> > Using the the python script "compare_test.py"to excute: python
> ./compare_test.py 192.168.116.213:33333,192.168.116.213:44444,
> 192.168.116.213:55555
> >
> > I run the test process on the machine that run memcached process, so
> that it is easy to get heavy workload.
> >
> > I got a test result of last 12 hours, watch at Cacti. it seems that
> there is no different for this traffic mode.
> > gets/sets = 9:1
> > hit_rate ~ 50%
> > [IMAGE]
> > I also print some detail statistics info in the test script:
> >
> > ​Cache list: ['192.168.116.213:33333', '192.168.116.213:44444', '
> 192.168.116.213:55555']
> > send_key_number: 127306 ------->unique keys number
> > test_loop: 0 --->loop forever, no limit
> > weight of get/set command: [10, 1] ----------> the weight of get/set
> command. Note: if get a key miss, it will set the key immediately, not count
> > into this weight.
> > show_interval: 10 ---the interval of showing statistics info.
> > stats_interval: 5 ---the interval of getting the stats of memcached.
> > show_stats_interval:[60, 3600, 43200] -----------the time-range of
> showing in second. e.g. "60" means "last 60s", and 3600 means "last 3600s"
> > len of keys: [4, 10, 50, 100, 200, 500, 1000, 2000, 5000, 10000]
> -------->possible length of keys we will set to memcached.
> > weight of keys'len: [3, 4, 5, 8, 8, 10, 5, 5, 2, 1]
> ------->weight of different length of value.
> > expire-time of keys: [1, 5, 10, 30, 60, 300, 600, 3600, 86400, 0]
> --->possible expire-time we used in set command. Independent with the
> length
> > of value.
> > weight of keys'expire-time: [1, 1, 2, 5, 8, 5, 6, 5, 3, 1] -->weight of
> different expire-time.
> > ...
> >
> > #28190284 command: 281902842 -------->the first number has no
> meaning; the second number is the command number we send to memcached.
> >
> > All the following number is recorded as increment, except the second
> number of the items.
> > 192.168.116.213:33333
> > [60s] gets: 523063, hit: 49%, updates: 52141, dels: 0,
> items: -8/69423, read: 53891331, write: 215106364, OOMs: 0, evict: 6626
> > [3600s] gets: 29664649, hit: 49%, updates: 2966798, dels: 0,
> items: 13/69423, read: 3038408576, write: 12218798832, OOMs: 0, evict:
> > 356323
> > 192.168.116.213:44444
> > [60s] gets: 523007, hit: 50%, updates: 52202, dels: 0,
> items: -62/69348, read: 53528995, write: 218847446, OOMs: 0, evict: 6539
> > [3600s] gets: 29667232, hit: 50%, updates: 2964220, dels: 0,
> items: -14/69348, read: 3030860658, write: 12405356058, OOMs: 0, evict:
> > 359460
> > 192.168.116.213:55555
> > [60s] gets: 523093, hit: 49%, updates: 52116, dels: 0,
> items: 28/69396, read: 52993446, write: 215231210, OOMs: 0, evict: 6491
> > [3600s] gets: 29669464, hit: 49%, updates: 2961988, dels: 0,
> items: -25/69396, read: 3038356827, write: 12219764097, OOMs: 0, evict:
> > 355644
> > ...
> >
> >
> >
> > On Fri, Jan 16, 2015 at 9:29 PM, Zhiwei Chan <[email protected]>
> wrote:
> > Our maintain team trend to be conservative, especially on the
> basic software relative to performance. so I think it is rare possible
> > to post it to the production recently. But I write a pretty
> convenient tools in Python for an A/B test. The tool can fake traffic of
> > random expire-time and random length, and also can specify the
> weights of different expire-time and length, and lots of other
> > functions. It is almost completed, and I can post a result next
> Monday.
> >
> > On Fri, Jan 16, 2015 at 11:12 AM, dormando <[email protected]> wrote:
> > If you want?
> >
> > What would make you confident enough to try the branch in
> production? Or
> > do you rely on your other patches and that's not really possible?
> >
> > On Thu, 15 Jan 2015, Zhiwei Chan wrote:
> >
> > > I try to use real traffic of application to make a compare
> test, but it seems that not all of guys use the cache-client with
> > consistent hash in
> > > dev environment. The result is that the traffic is not
> distributed well as I supposed.
> > > Should I fake the traffic and make a compare test instead of
> real traffic? e.g., fake the random expire-time keys traffic to
> > set and get for
> > > memcached.
> > >
> > > ---------------
> > > host mc56 installs the most update LRU-rework branch's memcached
> with option likes "/usr/local/bin/memcached -u nobody -d -c
> > 10240 -o
> > > lru_maintainer lru_crawler -m 64 -p 11811";
> > > host mc57 install the version 1.4.20_7_gb118a6c's memcached,
> with option likes "/usr/bin/memcached -u nobody -d -c 10240 -o
> > tail_repair_time=7200
> > > -m 64 -p 11811",
> > >
> > > I sum up the stats of all memcache instances on the host and
> make followings analysis:
> > >
> > > Inline image 1
> > >
> > > On Wed, Jan 14, 2015 at 1:58 AM, dormando <[email protected]>
> wrote:
> > > Last update to the branch was 3 days ago. I'm not planning
> on doing any
> > > more work on it at the moment, so people have a chance to
> test it.
> > >
> > > thanks!
> > >
> > > On Tue, 13 Jan 2015, Zhiwei Chan wrote:
> > >
> > > > I compile directly using your branch on the test server,
> and please tell me if it need update and re-compile.
> > > >
> > > > On Tue, Jan 13, 2015 at 4:20 AM, dormando <
> [email protected]> wrote:
> > > > That sounds like an okay place to start. Can you
> please make sure the
> > > > other dev server is running the very latest
> version of the branch? A lot
> > > > changed since last friday... a few pretty bad bugs.
> > > >
> > > > Please use the startup options described in the
> middle of the PR.
> > > >
> > > > If anyone's brave enough to try the latest branch
> on one production
> > > > instance (if they have a low traffic one
> somewhere, maybe?) that'd be
> > > > good. I ran the branch under a load tester for a
> few hours, it passes
> > > > tests, etc. If I merge it, it'll just go into
> people's productions without
> > > > ever having a production test first, so hopefully
> someone can try it?
> > > >
> > > > thanks
> > > >
> > > > On Mon, 12 Jan 2015, Zhiwei Chan wrote:
> > > >
> > > > > I have run it since last Friday, so far no
> crash. As I have finished the haproxy works today, I will try a
> > compare test for
> > > this
> > > > LRU works
> > > > > tomorrow as following: There are two
> servers(Centos 5.8, 8cores, 8G memory) in the dev environment, Both of
> > server run 32
> > > > memcached
> > > > > instances(processes) with maxmum memory of 128M.
> One server runs version 1.4.21, the other runs this branch.
> > There are lots
> > > of
> > > > "pools" using these
> > > > > memcached server, and all of pools use tow
> memcached instances on different server. The client of pools use
> > Consistent Hash
> > > algorithm
> > > > to distribute
> > > > > keys to their 2 memcached instances. I will
> watch the hit-rate and other performance using Cacti.
> > > > > I think it will work, but usually there is not
> much traffic in our dev environment. Please tell me if any
> > other advice.
> > > > >
> > > > >
> > > > > 2015-01-08 4:21 GMT+08:00 dormando <
> [email protected]>:
> > > > > Hey,
> > > > >
> > > > > To all three of you: Just run it anywhere
> you can (but not more than one
> > > > > machine, yet?), with the options
> prescribed in the PR. Ideally you have
> > > > > graphs of the hit ratio and maybe cache
> fullness and can compare
> > > > > before/after.
> > > > >
> > > > > And let me know if it hangs or crashes,
> obviously. If so a backtrace
> > > > > and/or coredump would be fantastic.
> > > > >
> > > > > On Thu, 8 Jan 2015, Zhiwei Chan wrote:
> > > > >
> > > > > > I will deploy it to one of our test
> environment on CentOS 5.8, for a comparison test with the 1.4.21,
> > although the
> > > > workloads is
> > > > > not as heavy as
> > > > > > product environment. Tell me if any I
> could help.
> > > > > >
> > > > > > 2015-01-07 23:30 GMT+08:00 Eric
> McConville <[email protected]>:
> > > > > > Same here. Do you want any
> findings posted to the mailing list, or the PU thread?
> > > > > >
> > > > > > On Wed, Jan 7, 2015 at 5:56 AM, Ryan
> McCullagh <[email protected]> wrote:
> > > > > > I'm willing to help out in any way
> possible. What can I do?
> > > > > >
> > > > > > -----Original Message-----
> > > > > > From: [email protected]
> [mailto:[email protected]] On
> > > > > > Behalf Of dormando
> > > > > > Sent: Wednesday, January 7, 2015
> 3:52 AM
> > > > > > To: [email protected]
> > > > > > Subject: memory efficiency / LRU
> refactor branch
> > > > > >
> > > > > > Yo,
> > > > > >
> > > > > >
> https://github.com/memcached/memcached/pull/97
> > > > > >
> > > > > > Opening to a wider audience. I
> need some folks willing to poke at it and see
> > > > > > if their workloads fair better or
> worse with respect to hit ratios.
> > > > > >
> > > > > > The rest of the work remaining on
> my end is more testing, and some TODO's
> > > > > > noted in the PR. The remaining
> work is relatively small aside from the page
> > > > > > mover idea. It hasn't been
> crashing or hanging in my testing so far, but
> > > > > > that might still happen.
> > > > > >
> > > > > > I can't/won't merge this until I
> get some evidence that it's useful.
> > > > > > Hoping someone out there can lend
> a hand. I don't know what the actual
> > > > > > impact would be, but for some
> workloads it could be large. Even for folks
> > > > > > who have set all items to never
> expire, it could still potentially improve
> > > > > > hit ratios by better protecting
> active items.
> > > > > >
> > > > > > It will work best if you at least
> have a mix of items with TTL's that expire
> > > > > > in reasonable amounts of time.
> > > > > >
> > > > > > thanks,
> > > > > > -Dormando
> > > > > >
> > > > > > --
> > > > > >
> > > > > > ---
> > > > > > You received this message because you
> are subscribed to the Google Groups "memcached" group.
> > > > > > To unsubscribe from this group and stop
> receiving emails from it, send an email to
> > > [email protected].
> > > > > > For more options, visit
> https://groups.google.com/d/optout.
> > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > > > ---
> > > > > > You received this message because you
> are subscribed to the Google Groups "memcached" group.
> > > > > > To unsubscribe from this group and stop
> receiving emails from it, send an email to
> > > [email protected].
> > > > > > For more options, visit
> https://groups.google.com/d/optout.
> > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > > > ---
> > > > > > You received this message because you
> are subscribed to the Google Groups "memcached" group.
> > > > > > To unsubscribe from this group and stop
> receiving emails from it, send an email to
> > > [email protected].
> > > > > > For more options, visit
> https://groups.google.com/d/optout.
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > ---
> > > > > You received this message because you are
> subscribed to the Google Groups "memcached" group.
> > > > > To unsubscribe from this group and stop
> receiving emails from it, send an email to
> > [email protected].
> > > > > For more options, visit
> https://groups.google.com/d/optout.
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > >
> > > > ---
> > > > You received this message because you are subscribed to
> the Google Groups "memcached" group.
> > > > To unsubscribe from this group and stop receiving emails
> from it, send an email to
> > [email protected].
> > > > For more options, visit
> https://groups.google.com/d/optout.
> > > >
> > > >
> > >
> > >
> > > --
> > >
> > > ---
> > > You received this message because you are subscribed to the
> Google Groups "memcached" group.
> > > To unsubscribe from this group and stop receiving emails from
> it, send an email to [email protected].
> > > For more options, visit https://groups.google.com/d/optout.
> > >
> > >
> >
> >
> >
> > --
> >
> > ---
> > You received this message because you are subscribed to the Google
> Groups "memcached" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> an email to [email protected].
> > For more options, visit https://groups.google.com/d/optout.
> >
> >
>
--
---
You received this message because you are subscribed to the Google Groups
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.