The command line I've used that will start is: memcached -m 64 -o slab_reassign,slab_automove
the ones that fail are: memcached -m 64 -o slab_reassign,slab_automove,lru_crawler,lru_maintainer memcached -o lru_crawler I'm sure I've missed something during compile, though I just used ./configure and make. On Monday, August 3, 2015 at 12:22:33 AM UTC-7, Scott Mansfield wrote: > > I've attached a pretty simple program to connect, fill a slab with data, > and then fill another slab slowly with data of a different size. I've been > trying to get memcached to run with the lru_crawler and lru_maintainer > flags, but I get ' > > Illegal suboption "(null)"' every time I try to start with either in any > configuration. > > > I haven't seen it start to move slabs automatically with a freshly > installed 1.2.24. > > On Tuesday, July 21, 2015 at 4:55:17 PM UTC-7, Scott Mansfield wrote: >> >> I realize I've not given you the tests to reproduce the behavior. I >> should be able to soon. Sorry about the delay here. >> >> In the mean time, I wanted to bring up a possible secondary use of the >> same logic to move items on slab rebalancing. I think the system might >> benefit from using the same logic to crawl the pages in a slab and compact >> the data in the background. In the case where we have memory that is >> assigned to the slab but not being used because of replaced or TTL'd out >> data, returning the memory to a pool of free memory will allow a slab to >> grow with that memory first instead of waiting for an event where memory is >> needed at that instant. >> >> It's a change in approach, from reactive to proactive. What do you think? >> >> On Monday, July 13, 2015 at 5:54:11 PM UTC-7, Dormando wrote: >>> >>> > First, more detail for you: >>> > >>> > We are running 1.4.24 in production and haven't noticed any bugs as of >>> yet. The new LRUs seem to be working well, though we nearly always run >>> memcached scaled to hold all data without evictions. Those with evictions >>> are behaving well. Those without evictions haven't seen crashing or any >>> other noticeable bad behavior. >>> >>> Neat. >>> >>> > >>> > OK, I think I see an area where I was speculating on functionality. If >>> you have a key in slab 21 and then the same key is written again at a >>> larger size in slab 23 I assumed that the space in 21 was not freed on the >>> second write. With that assumption, the LRU crawler would not free up that >>> space. Also just by observation in the macro, the space is not freed >>> > fast enough to be effective, in our use case, to accept the writes >>> that are happening. Think in the hundreds of millions of "overwrites" in a >>> 6 - 10 hour period across a cluster. >>> >>> Internally, "items" (a key/value pair) are generally immutable. The only >>> time when it's not is for INCR/DECR, and it still becomes immutable if >>> two >>> INCR/DECR's collide. >>> >>> What this means, is that the new item is staged in a piece of free >>> memory >>> while the "upload" stage of the SET happens. When memcached has all of >>> the >>> data in memory to replace the item, it does an internal swap under a >>> lock. >>> The old item is removed from the hash table and LRU, and the new item >>> gets >>> put in its place (at the head of the LRU). >>> >>> Since items are refcounted, this means that if other users are >>> downloading >>> an item which just got replaced, their memory doesn't get corrupted by >>> the >>> item changing out from underneath them. They can continue to read the >>> old >>> item until they're done. When the refcount reaches zero the old memory >>> is >>> reclaimed. >>> >>> Most of the time, the item replacement happens then the old memory is >>> immediately removed. >>> >>> However, this does mean that you need *one* piece of free memory to >>> replace the old one. Then the old memory gets freed after that set. >>> >>> So if you take a memcached instance with 0 free chunks, and do a rolling >>> replacement of all items (within the same slab class as before), the >>> first >>> one would cause an eviction from the tail of the LRU to get a free >>> chunk. >>> Every SET after that would use the chunk freed from the replacement of >>> the >>> previous memory. >>> >>> > After that last sentence I realized I also may not have explained well >>> enough the access pattern. The keys are all overwritten every day, but it >>> takes some time to write them all (obviously). We see a huge increase in >>> the bytes metric as if the new data for the old keys was being written for >>> the first time. Since the "old" slab for the same key doesn't >>> > proactively release memory, it starts to fill up the cache and then >>> start evicting data in the new slab. Once that happens, we see evictions in >>> the old slab because of the algorithm you mentioned (random picking / >>> freeing of memory). Typically we don't see any use for "upgrading" an item >>> as the new data would be entirely new and should wholesale replace the >>> > old data for that key. More specifically, the operation is always set, >>> with different data each day. >>> >>> Right. Most of your problems will come from two areas. One being that >>> writing data aggressively into the new slab class (unless you set the >>> rebalancer to always-replace mode), the mover will make memory available >>> more slowly than you can insert. So you'll cause extra evictions in the >>> new slab class. >>> >>> The secondary problem is from the random evictions in the previous slab >>> class as stuff is chucked on the floor to make memory moveable. >>> >>> > As for testing, we'll be able to put it under real production >>> workload. I don't know what kind of data you mean you need for testing. The >>> data stored in the caches are highly confidential. I can give you all kinds >>> of metrics, since we collect most of the ones that are in the stats and >>> some from the stats slabs output. If you have some specific ones that >>> > need collecting, I'll double check and make sure we can get those. >>> Alternatively, it might be most beneficial to see the metrics in person :) >>> >>> I just need stats snapshots here and there, and actually putting the >>> thing >>> under load. When I did the LRU work I had to beg for several months >>> before anyone tested it with a production load. This slows things down >>> and >>> demotivates me from working on the project. >>> >>> Unfortunately my dayjob keeps me pretty busy so ~internet~ would >>> probably >>> be best. >>> >>> > I can create a driver program to reproduce the behavior on a smaller >>> scale. It would write e.g. 10k keys of 10k size, then rewrite the same keys >>> with different size data. I'll work on that and post it to this thread when >>> I can reproduce the behavior locally. >>> >>> Ok. There're slab rebalance unit tests in the t/ directory which do >>> things >>> like this, and I've used mc-crusher to slam the rebalancer. It's pretty >>> easy to run one config to load up 10k objects, then flip to the other >>> using the same key namespace. >>> >>> > Thanks, >>> > Scott >>> > >>> > On Saturday, July 11, 2015 at 12:05:54 PM UTC-7, Dormando wrote: >>> > Hey, >>> > >>> > On Fri, 10 Jul 2015, Scott Mansfield wrote: >>> > >>> > > We've seen issues recently where we run a cluster that >>> typically has the majority of items overwritten in the same slab every day >>> and a sudden change in data size evicts a ton of data, affecting downstream >>> systems. To be clear that is our problem, but I think there's a tweak in >>> memcached that might be useful and another possible feature that >>> > would be even >>> > > better. >>> > > The data that is written to this cache is overwritten every >>> day, though the TTL is 7 days. One slab takes up the majority of the space >>> in the cache. The application wrote e.g. 10KB (slab 21) every day for each >>> key consistently. One day, a change occurred where it started writing 15KB >>> (slab 23), causing a migration of data from one slab to >>> > another. We had -o >>> > > slab_reassign,slab_automove=1 set on the server, causing large >>> numbers of evictions on the initial slab. Let's say the cache could hold >>> the data at 15KB per key, but the old data was not technically TTL'd out in >>> it's old slab. This means that memory was not being freed by the lru >>> crawler thread (I think) because its expiry had not come >>> > around. >>> > > >>> > > lines 1199 and 1200 in items.c: >>> > > if ((search->exptime != 0 && search->exptime < current_time) >>> || is_flushed(search)) { >>> > > >>> > > If there was a check to see if this data was "orphaned," i.e. >>> that the key, if accessed, would map to a different slab than the current >>> one, then these orphans could be reclaimed as free memory. I am working on >>> a patch to do this, though I have reservations about performing a hash on >>> the key on the lru crawler thread (if the hash is not >>> > already available). >>> > > I have very little experience in the memcached codebase so I >>> don't know the most efficient way to do this. Any help would be >>> appreciated. >>> > >>> > There seems to be a misconception about how the slab classes >>> work. A key, >>> > if already existing in a slab, will always map to the slab class >>> it >>> > currently fits into. The slab classes always exist, but the >>> amount of >>> > memory reserved for each of them will shift with the >>> slab_reassign. ie: 10 >>> > pages in slab class 21, then memory pressure on 23 causes it to >>> move over. >>> > >>> > So if you examine a key that still exists in slab class 21, it >>> has no >>> > reason to move up or down the slab classes. >>> > >>> > > Alternatively, and possibly more beneficial is compaction of >>> data in a slab using the same set of criteria as lru crawling. >>> Understandably, compaction is a very difficult problem to solve since >>> moving the data would be a pain in the ass. I saw a couple of discussions >>> about this in the mailing list, though I didn't see any firm thoughts about >>> > it. I think it >>> > > can probably be done in O(1) like the lru crawler by limiting >>> the number of items it touches each time. Writing and reading are doable in >>> O(1) so moving should be as well. Has anyone given more thought on >>> compaction? >>> > >>> > I'd be interested in hacking this up for you folks if you can >>> provide me >>> > testing and some data to work with. With all of the LRU work I >>> did in >>> > 1.4.24, the next things I wanted to do is a big improvement on >>> the slab >>> > reassignment code. >>> > >>> > Currently it picks essentially a random slab page, empties it, >>> and moves >>> > the slab page into the class under pressure. >>> > >>> > One thing we can do is first examine for free memory in the >>> existing slab, >>> > IE: >>> > >>> > - Take a page from slab 21 >>> > - Scan the page for valid items which need to be moved >>> > - Pull free memory from slab 21, migrate the item (moderately >>> complicated) >>> > - When the page is empty, move it (or give up if you run out of >>> free >>> > chunks). >>> > >>> > The next step is to pull from the LRU on slab 21: >>> > >>> > - Take page from slab 21 >>> > - Scan page for valid items >>> > - Pull free memory from slab 21, migrate the item >>> > - If no memory free, evict tail of slab 21. use that chunk. >>> > - When the page is empty, move it. >>> > >>> > Then, when you hit this condition your least-recently-used data >>> gets >>> > culled as new data migrates your page class. This should match a >>> natural >>> > occurrance if you would already be evicting valid (but old) >>> items to make >>> > room for new items. >>> > >>> > A bonus to using the free memory trick, is that I can use the >>> amount of >>> > free space in a slab class as a heuristic to more quickly move >>> slab pages >>> > around. >>> > >>> > If it's still necessary from there, we can explore "upgrading" >>> items to a >>> > new slab class, but that is much much more complicated since the >>> item has >>> > to shift LRU's. Do you put it at the head, the tail, the middle, >>> etc? It >>> > might be impossible to make a good generic decision there. >>> > >>> > What version are you currently on? If 1.4.24, have you seen any >>> > instability? I'm currently torn between fighting a few bugs and >>> start on >>> > improving the slab rebalancer. >>> > >>> > -Dormando >>> > >>> > >>> > On Saturday, July 11, 2015 at 12:05:54 PM UTC-7, Dormando wrote: >>> > Hey, >>> > >>> > On Fri, 10 Jul 2015, Scott Mansfield wrote: >>> > >>> > > We've seen issues recently where we run a cluster that >>> typically has the majority of items overwritten in the same slab every day >>> and a sudden change in data size evicts a ton of data, affecting downstream >>> systems. To be clear that is our problem, but I think there's a tweak in >>> memcached that might be useful and another possible feature that >>> > would be even >>> > > better. >>> > > The data that is written to this cache is overwritten every >>> day, though the TTL is 7 days. One slab takes up the majority of the space >>> in the cache. The application wrote e.g. 10KB (slab 21) every day for each >>> key consistently. One day, a change occurred where it started writing 15KB >>> (slab 23), causing a migration of data from one slab to >>> > another. We had -o >>> > > slab_reassign,slab_automove=1 set on the server, causing large >>> numbers of evictions on the initial slab. Let's say the cache could hold >>> the data at 15KB per key, but the old data was not technically TTL'd out in >>> it's old slab. This means that memory was not being freed by the lru >>> crawler thread (I think) because its expiry had not come >>> > around. >>> > > >>> > > lines 1199 and 1200 in items.c: >>> > > if ((search->exptime != 0 && search->exptime < current_time) >>> || is_flushed(search)) { >>> > > >>> > > If there was a check to see if this data was "orphaned," i.e. >>> that the key, if accessed, would map to a different slab than the current >>> one, then these orphans could be reclaimed as free memory. I am working on >>> a patch to do this, though I have reservations about performing a hash on >>> the key on the lru crawler thread (if the hash is not >>> > already available). >>> > > I have very little experience in the memcached codebase so I >>> don't know the most efficient way to do this. Any help would be >>> appreciated. >>> > >>> > There seems to be a misconception about how the slab classes >>> work. A key, >>> > if already existing in a slab, will always map to the slab class >>> it >>> > currently fits into. The slab classes always exist, but the >>> amount of >>> > memory reserved for each of them will shift with the >>> slab_reassign. ie: 10 >>> > pages in slab class 21, then memory pressure on 23 causes it to >>> move over. >>> > >>> > So if you examine a key that still exists in slab class 21, it >>> has no >>> > reason to move up or down the slab classes. >>> > >>> > > Alternatively, and possibly more beneficial is compaction of >>> data in a slab using the same set of criteria as lru crawling. >>> Understandably, compaction is a very difficult problem to solve since >>> moving the data would be a pain in the ass. I saw a couple of discussions >>> about this in the mailing list, though I didn't see any firm thoughts about >>> > it. I think it >>> > > can probably be done in O(1) like the lru crawler by limiting >>> the number of items it touches each time. Writing and reading are doable in >>> O(1) so moving should be as well. Has anyone given more thought on >>> compaction? >>> > >>> > I'd be interested in hacking this up for you folks if you can >>> provide me >>> > testing and some data to work with. With all of the LRU work I >>> did in >>> > 1.4.24, the next things I wanted to do is a big improvement on >>> the slab >>> > reassignment code. >>> > >>> > Currently it picks essentially a random slab page, empties it, >>> and moves >>> > the slab page into the class under pressure. >>> > >>> > One thing we can do is first examine for free memory in the >>> existing slab, >>> > IE: >>> > >>> > - Take a page from slab 21 >>> > - Scan the page for valid items which need to be moved >>> > - Pull free memory from slab 21, migrate the item (moderately >>> complicated) >>> > - When the page is empty, move it (or give up if you run out of >>> free >>> > chunks). >>> > >>> > The next step is to pull from the LRU on slab 21: >>> > >>> > - Take page from slab 21 >>> > - Scan page for valid items >>> > - Pull free memory from slab 21, migrate the item >>> > - If no memory free, evict tail of slab 21. use that chunk. >>> > - When the page is empty, move it. >>> > >>> > Then, when you hit this condition your least-recently-used data >>> gets >>> > culled as new data migrates your page class. This should match a >>> natural >>> > occurrance if you would already be evicting valid (but old) >>> items to make >>> > room for new items. >>> > >>> > A bonus to using the free memory trick, is that I can use the >>> amount of >>> > free space in a slab class as a heuristic to more quickly move >>> slab pages >>> > around. >>> > >>> > If it's still necessary from there, we can explore "upgrading" >>> items to a >>> > new slab class, but that is much much more complicated since the >>> item has >>> > to shift LRU's. Do you put it at the head, the tail, the middle, >>> etc? It >>> > might be impossible to make a good generic decision there. >>> > >>> > What version are you currently on? If 1.4.24, have you seen any >>> > instability? I'm currently torn between fighting a few bugs and >>> start on >>> > improving the slab rebalancer. >>> > >>> > -Dormando >>> > >>> > -- >>> > >>> > --- >>> > You received this message because you are subscribed to the Google >>> Groups "memcached" group. >>> > To unsubscribe from this group and stop receiving emails from it, send >>> an email to memcached+...@googlegroups.com. >>> > For more options, visit https://groups.google.com/d/optout. >>> > >>> > >> >> -- --- You received this message because you are subscribed to the Google Groups "memcached" group. To unsubscribe from this group and stop receiving emails from it, send an email to memcached+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.