No worries man, you're doing us a favor. Let me know if there's anything you need from us, and I promise I'll be quicker this time :) On Aug 18, 2015 12:01 AM, "dormando" <dorma...@rydia.net> wrote:
> Hey, > > I'm still really interested in working on this. I'll be taking a careful > look soon I hope. > > On Mon, 3 Aug 2015, Scott Mansfield wrote: > > > I've tweaked the program slightly, so I'm adding a new version. It > prints more stats as it goes and runs a bit faster. > > > > On Monday, August 3, 2015 at 1:20:37 AM UTC-7, Scott Mansfield wrote: > > Total brain fart on my part. Apparently I had memcached 1.4.13 on > my path (who knows how...) Using the actual one that I've built works. > Sorry for the confusion... can't believe I didn't realize that before. I'm > testing against the compiled one now to see how it behaves. > > On Monday, August 3, 2015 at 1:15:06 AM UTC-7, Dormando wrote: > > You sure that's 1.4.24? None of those fail for me :( > > > > On Mon, 3 Aug 2015, Scott Mansfield wrote: > > > > > The command line I've used that will start is: > > > > > > memcached -m 64 -o slab_reassign,slab_automove > > > > > > > > > the ones that fail are: > > > > > > > > > memcached -m 64 -o > slab_reassign,slab_automove,lru_crawler,lru_maintainer > > > > > > memcached -o lru_crawler > > > > > > > > > I'm sure I've missed something during compile, though I > just used ./configure and make. > > > > > > > > > On Monday, August 3, 2015 at 12:22:33 AM UTC-7, Scott > Mansfield wrote: > > > I've attached a pretty simple program to connect, > fill a slab with data, and then fill another slab slowly with data of a > different size. I've been trying to get memcached to run with the > lru_crawler and lru_maintainer flags, but I get ' > > > > > > Illegal suboption "(null)"' every time I try to > start with either in any configuration. > > > > > > > > > I haven't seen it start to move slabs automatically > with a freshly installed 1.2.24. > > > > > > > > > On Tuesday, July 21, 2015 at 4:55:17 PM UTC-7, Scott > Mansfield wrote: > > > I realize I've not given you the tests to > reproduce the behavior. I should be able to soon. Sorry about the delay > here. > > > In the mean time, I wanted to bring up a possible > secondary use of the same logic to move items on slab rebalancing. I think > the system might benefit from using the same logic to crawl the pages in a > slab and compact the data in the background. In the case where we have > memory that is assigned to the slab but not being used because > > of replaced > > > or TTL'd out data, returning the memory to a pool of free > memory will allow a slab to grow with that memory first instead of waiting > for an event where memory is needed at that instant. > > > > > > It's a change in approach, from reactive to proactive. > What do you think? > > > > > > On Monday, July 13, 2015 at 5:54:11 PM UTC-7, Dormando > wrote: > > > > First, more detail for you: > > > > > > > > We are running 1.4.24 in production and haven't > noticed any bugs as of yet. The new LRUs seem to be working well, though we > nearly always run memcached scaled to hold all data without evictions. > Those with evictions are behaving well. Those without evictions haven't > seen crashing or any other noticeable bad behavior. > > > > > > Neat. > > > > > > > > > > > OK, I think I see an area where I was speculating > on functionality. If you have a key in slab 21 and then the same key is > written again at a larger size in slab 23 I assumed that the space in 21 > was not freed on the second write. With that assumption, the LRU crawler > would not free up that space. Also just by observation in > > the > > > macro, the space is not freed > > > > fast enough to be effective, in our use case, to > accept the writes that are happening. Think in the hundreds of millions of > "overwrites" in a 6 - 10 hour period across a cluster. > > > > > > Internally, "items" (a key/value pair) are generally > immutable. The only > > > time when it's not is for INCR/DECR, and it still > becomes immutable if two > > > INCR/DECR's collide. > > > > > > What this means, is that the new item is staged in a > piece of free memory > > > while the "upload" stage of the SET happens. When > memcached has all of the > > > data in memory to replace the item, it does an > internal swap under a lock. > > > The old item is removed from the hash table and LRU, > and the new item gets > > > put in its place (at the head of the LRU). > > > > > > Since items are refcounted, this means that if other > users are downloading > > > an item which just got replaced, their memory > doesn't get corrupted by the > > > item changing out from underneath them. They can > continue to read the old > > > item until they're done. When the refcount reaches > zero the old memory is > > > reclaimed. > > > > > > Most of the time, the item replacement happens then > the old memory is > > > immediately removed. > > > > > > However, this does mean that you need *one* piece of > free memory to > > > replace the old one. Then the old memory gets freed > after that set. > > > > > > So if you take a memcached instance with 0 free > chunks, and do a rolling > > > replacement of all items (within the same slab class > as before), the first > > > one would cause an eviction from the tail of the LRU > to get a free chunk. > > > Every SET after that would use the chunk freed from > the replacement of the > > > previous memory. > > > > > > > After that last sentence I realized I also may not > have explained well enough the access pattern. The keys are all overwritten > every day, but it takes some time to write them all (obviously). We see a > huge increase in the bytes metric as if the new data for the old keys was > being written for the first time. Since the "old" > > slab for > > > the same key doesn't > > > > proactively release memory, it starts to fill up > the cache and then start evicting data in the new slab. Once that happens, > we see evictions in the old slab because of the algorithm you mentioned > (random picking / freeing of memory). Typically we don't see any use for > "upgrading" an item as the new data would be entirely > > new and > > > should wholesale replace the > > > > old data for that key. More specifically, the > operation is always set, with different data each day. > > > > > > Right. Most of your problems will come from two > areas. One being that > > > writing data aggressively into the new slab class > (unless you set the > > > rebalancer to always-replace mode), the mover will > make memory available > > > more slowly than you can insert. So you'll cause > extra evictions in the > > > new slab class. > > > > > > The secondary problem is from the random evictions > in the previous slab > > > class as stuff is chucked on the floor to make > memory moveable. > > > > > > > As for testing, we'll be able to put it under real > production workload. I don't know what kind of data you mean you need for > testing. The data stored in the caches are highly confidential. I can give > you all kinds of metrics, since we collect most of the ones that are in the > stats and some from the stats slabs output. If > > you have > > > some specific ones that > > > > need collecting, I'll double check and make sure > we can get those. Alternatively, it might be most beneficial to see the > metrics in person :) > > > > > > I just need stats snapshots here and there, and > actually putting the thing > > > under load. When I did the LRU work I had to beg for > several months > > > before anyone tested it with a production load. This > slows things down and > > > demotivates me from working on the project. > > > > > > Unfortunately my dayjob keeps me pretty busy so > ~internet~ would probably > > > be best. > > > > > > > I can create a driver program to reproduce the > behavior on a smaller scale. It would write e.g. 10k keys of 10k size, then > rewrite the same keys with different size data. I'll work on that and post > it to this thread when I can reproduce the behavior locally. > > > > > > Ok. There're slab rebalance unit tests in the t/ > directory which do things > > > like this, and I've used mc-crusher to slam the > rebalancer. It's pretty > > > easy to run one config to load up 10k objects, then > flip to the other > > > using the same key namespace. > > > > > > > Thanks, > > > > Scott > > > > > > > > On Saturday, July 11, 2015 at 12:05:54 PM UTC-7, > Dormando wrote: > > > > Hey, > > > > > > > > On Fri, 10 Jul 2015, Scott Mansfield wrote: > > > > > > > > > We've seen issues recently where we run a > cluster that typically has the majority of items overwritten in the same > slab every day and a sudden change in data size evicts a ton of data, > affecting downstream systems. To be clear that is our problem, but I think > there's a tweak in memcached that might be useful and > > another > > > possible feature that > > > > would be even > > > > > better. > > > > > The data that is written to this cache is > overwritten every day, though the TTL is 7 days. One slab takes up the > majority of the space in the cache. The application wrote e.g. 10KB (slab > 21) every day for each key consistently. One day, a change occurred where > it started writing 15KB (slab 23), causing a migration > > of data > > > from one slab to > > > > another. We had -o > > > > > slab_reassign,slab_automove=1 set on the > server, causing large numbers of evictions on the initial slab. Let's say > the cache could hold the data at 15KB per key, but the old data was not > technically TTL'd out in it's old slab. This means that memory was not > being freed by the lru crawler thread (I think) because its > > expiry > > > had not come > > > > around. > > > > > > > > > > lines 1199 and 1200 in items.c: > > > > > if ((search->exptime != 0 && > search->exptime < current_time) || is_flushed(search)) { > > > > > > > > > > If there was a check to see if this data > was "orphaned," i.e. that the key, if accessed, would map to a different > slab than the current one, then these orphans could be reclaimed as free > memory. I am working on a patch to do this, though I have reservations > about performing a hash on the key on the lru crawler > > thread (if > > > the hash is not > > > > already available). > > > > > I have very little experience in the > memcached codebase so I don't know the most efficient way to do this. Any > help would be appreciated. > > > > > > > > There seems to be a misconception about how > the slab classes work. A key, > > > > if already existing in a slab, will always > map to the slab class it > > > > currently fits into. The slab classes always > exist, but the amount of > > > > memory reserved for each of them will shift > with the slab_reassign. ie: 10 > > > > pages in slab class 21, then memory pressure > on 23 causes it to move over. > > > > > > > > So if you examine a key that still exists in > slab class 21, it has no > > > > reason to move up or down the slab classes. > > > > > > > > > Alternatively, and possibly more > beneficial is compaction of data in a slab using the same set of criteria > as lru crawling. Understandably, compaction is a very difficult problem to > solve since moving the data would be a pain in the ass. I saw a couple of > discussions about this in the mailing list, though I didn't > > see any > > > firm thoughts about > > > > it. I think it > > > > > can probably be done in O(1) like the lru > crawler by limiting the number of items it touches each time. Writing and > reading are doable in O(1) so moving should be as well. Has anyone given > more thought on compaction? > > > > > > > > I'd be interested in hacking this up for you > folks if you can provide me > > > > testing and some data to work with. With all > of the LRU work I did in > > > > 1.4.24, the next things I wanted to do is a > big improvement on the slab > > > > reassignment code. > > > > > > > > Currently it picks essentially a random slab > page, empties it, and moves > > > > the slab page into the class under pressure. > > > > > > > > One thing we can do is first examine for > free memory in the existing slab, > > > > IE: > > > > > > > > - Take a page from slab 21 > > > > - Scan the page for valid items which need > to be moved > > > > - Pull free memory from slab 21, migrate the > item (moderately complicated) > > > > - When the page is empty, move it (or give > up if you run out of free > > > > chunks). > > > > > > > > The next step is to pull from the LRU on > slab 21: > > > > > > > > - Take page from slab 21 > > > > - Scan page for valid items > > > > - Pull free memory from slab 21, migrate the > item > > > > - If no memory free, evict tail of slab > 21. use that chunk. > > > > - When the page is empty, move it. > > > > > > > > Then, when you hit this condition your > least-recently-used data gets > > > > culled as new data migrates your page class. > This should match a natural > > > > occurrance if you would already be evicting > valid (but old) items to make > > > > room for new items. > > > > > > > > A bonus to using the free memory trick, is > that I can use the amount of > > > > free space in a slab class as a heuristic to > more quickly move slab pages > > > > around. > > > > > > > > If it's still necessary from there, we can > explore "upgrading" items to a > > > > new slab class, but that is much much more > complicated since the item has > > > > to shift LRU's. Do you put it at the head, > the tail, the middle, etc? It > > > > might be impossible to make a good generic > decision there. > > > > > > > > What version are you currently on? If > 1.4.24, have you seen any > > > > instability? I'm currently torn between > fighting a few bugs and start on > > > > improving the slab rebalancer. > > > > > > > > -Dormando > > > > > > > > > > > > On Saturday, July 11, 2015 at 12:05:54 PM UTC-7, > Dormando wrote: > > > > Hey, > > > > > > > > On Fri, 10 Jul 2015, Scott Mansfield wrote: > > > > > > > > > We've seen issues recently where we run a > cluster that typically has the majority of items overwritten in the same > slab every day and a sudden change in data size evicts a ton of data, > affecting downstream systems. To be clear that is our problem, but I think > there's a tweak in memcached that might be useful and > > another > > > possible feature that > > > > would be even > > > > > better. > > > > > The data that is written to this cache is > overwritten every day, though the TTL is 7 days. One slab takes up the > majority of the space in the cache. The application wrote e.g. 10KB (slab > 21) every day for each key consistently. One day, a change occurred where > it started writing 15KB (slab 23), causing a migration > > of data > > > from one slab to > > > > another. We had -o > > > > > slab_reassign,slab_automove=1 set on the > server, causing large numbers of evictions on the initial slab. Let's say > the cache could hold the data at 15KB per key, but the old data was not > technically TTL'd out in it's old slab. This means that memory was not > being freed by the lru crawler thread (I think) because its > > expiry > > > had not come > > > > around. > > > > > > > > > > lines 1199 and 1200 in items.c: > > > > > if ((search->exptime != 0 && > search->exptime < current_time) || is_flushed(search)) { > > > > > > > > > > If there was a check to see if this data > was "orphaned," i.e. that the key, if accessed, would map to a different > slab than the current one, then these orphans could be reclaimed as free > memory. I am working on a patch to do this, though I have reservations > about performing a hash on the key on the lru crawler > > thread (if > > > the hash is not > > > > already available). > > > > > I have very little experience in the > memcached codebase so I don't know the most efficient way to do this. Any > help would be appreciated. > > > > > > > > There seems to be a misconception about how > the slab classes work. A key, > > > > if already existing in a slab, will always > map to the slab class it > > > > currently fits into. The slab classes always > exist, but the amount of > > > > memory reserved for each of them will shift > with the slab_reassign. ie: 10 > > > > pages in slab class 21, then memory pressure > on 23 causes it to move over. > > > > > > > > So if you examine a key that still exists in > slab class 21, it has no > > > > reason to move up or down the slab classes. > > > > > > > > > Alternatively, and possibly more > beneficial is compaction of data in a slab using the same set of criteria > as lru crawling. Understandably, compaction is a very difficult problem to > solve since moving the data would be a pain in the ass. I saw a couple of > discussions about this in the mailing list, though I didn't > > see any > > > firm thoughts about > > > > it. I think it > > > > > can probably be done in O(1) like the lru > crawler by limiting the number of items it touches each time. Writing and > reading are doable in O(1) so moving should be as well. Has anyone given > more thought on compaction? > > > > > > > > I'd be interested in hacking this up for you > folks if you can provide me > > > > testing and some data to work with. With all > of the LRU work I did in > > > > 1.4.24, the next things I wanted to do is a > big improvement on the slab > > > > reassignment code. > > > > > > > > Currently it picks essentially a random slab > page, empties it, and moves > > > > the slab page into the class under pressure. > > > > > > > > One thing we can do is first examine for > free memory in the existing slab, > > > > IE: > > > > > > > > - Take a page from slab 21 > > > > - Scan the page for valid items which need > to be moved > > > > - Pull free memory from slab 21, migrate the > item (moderately complicated) > > > > - When the page is empty, move it (or give > up if you run out of free > > > > chunks). > > > > > > > > The next step is to pull from the LRU on > slab 21: > > > > > > > > - Take page from slab 21 > > > > - Scan page for valid items > > > > - Pull free memory from slab 21, migrate the > item > > > > - If no memory free, evict tail of slab > 21. use that chunk. > > > > - When the page is empty, move it. > > > > > > > > Then, when you hit this condition your > least-recently-used data gets > > > > culled as new data migrates your page class. > This should match a natural > > > > occurrance if you would already be evicting > valid (but old) items to make > > > > room for new items. > > > > > > > > A bonus to using the free memory trick, is > that I can use the amount of > > > > free space in a slab class as a heuristic to > more quickly move slab pages > > > > around. > > > > > > > > If it's still necessary from there, we can > explore "upgrading" items to a > > > > new slab class, but that is much much more > complicated since the item has > > > > to shift LRU's. Do you put it at the head, > the tail, the middle, etc? It > > > > might be impossible to make a good generic > decision there. > > > > > > > > What version are you currently on? If > 1.4.24, have you seen any > > > > instability? I'm currently torn between > fighting a few bugs and start on > > > > improving the slab rebalancer. > > > > > > > > -Dormando > > > > > > > > -- > > > > > > > > --- > > > > You received this message because you are > subscribed to the Google Groups "memcached" group. > > > > To unsubscribe from this group and stop receiving > emails from it, send an email to memcached+...@googlegroups.com. > > > > For more options, visit > https://groups.google.com/d/optout. > > > > > > > > > > > > > > -- > > > > > > --- > > > You received this message because you are subscribed to > the Google Groups "memcached" group. > > > To unsubscribe from this group and stop receiving emails > from it, send an email to memcached+...@googlegroups.com. > > > For more options, visit https://groups.google.com/d/optout > . > > > > > > > > > > -- > > > > --- > > You received this message because you are subscribed to the Google > Groups "memcached" group. > > To unsubscribe from this group and stop receiving emails from it, send > an email to memcached+unsubscr...@googlegroups.com. > > For more options, visit https://groups.google.com/d/optout. > > > > -- --- You received this message because you are subscribed to the Google Groups "memcached" group. To unsubscribe from this group and stop receiving emails from it, send an email to memcached+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.