Re: Check for orphaned items in lru crawler thread

Scott Mansfield Mon, 13 Jul 2015 11:45:34 -0700

First, more detail for you:

We are running 1.4.24 in production and haven't noticed any bugs as of yet. 
The new LRUs seem to be working well, though we nearly always run memcached 
scaled to hold all data without evictions. Those with evictions are 
behaving well. Those without evictions haven't seen crashing or any other 
noticeable bad behavior.



OK, I think I see an area where I was speculating on functionality. If you 
have a key in slab 21 and then the same key is written again at a larger 
size in slab 23 I assumed that the space in 21 was not freed on the second 
write. With that assumption, the LRU crawler would not free up that space. 
Also just by observation in the macro, the space is not freed fast enough 
to be effective, in our use case, to accept the writes that are happening. 
Think in the hundreds of millions of "overwrites" in a 6 - 10 hour period 
across a cluster.

After that last sentence I realized I also may not have explained well 
enough the access pattern. The keys are all overwritten every day, but it 
takes some time to write them all (obviously). We see a huge increase in 
the bytes metric as if the new data for the old keys was being written for 
the first time. Since the "old" slab for the same key doesn't proactively 
release memory, it starts to fill up the cache and then start evicting data 
in the new slab. Once that happens, we see evictions in the old slab 
because of the algorithm you mentioned (random picking / freeing of 
memory). Typically we don't see any use for "upgrading" an item as the new 
data would be entirely new and should wholesale replace the old data for 
that key. More specifically, the operation is always set, with different 
data each day.

As for testing, we'll be able to put it under real production workload. I 
don't know what kind of data you mean you need for testing. The data stored 
in the caches are highly confidential. I can give you all kinds of metrics, 
since we collect most of the ones that are in the stats and some from the 
stats slabs output. If you have some specific ones that need collecting, 
I'll double check and make sure we can get those. Alternatively, it might 
be most beneficial to see the metrics in person :)

I can create a driver program to reproduce the behavior on a smaller scale. 
It would write e.g. 10k keys of 10k size, then rewrite the same keys with 
different size data. I'll work on that and post it to this thread when I 
can reproduce the behavior locally.

Thanks,
Scott

On Saturday, July 11, 2015 at 12:05:54 PM UTC-7, Dormando wrote:
>
> Hey, 
>
> On Fri, 10 Jul 2015, Scott Mansfield wrote: 
>
> > We've seen issues recently where we run a cluster that typically has the 
> majority of items overwritten in the same slab every day and a sudden 
> change in data size evicts a ton of data, affecting downstream systems. To 
> be clear that is our problem, but I think there's a tweak in memcached that 
> might be useful and another possible feature that would be even 
> > better. 
> > The data that is written to this cache is overwritten every day, though 
> the TTL is 7 days. One slab takes up the majority of the space in the 
> cache. The application wrote e.g. 10KB (slab 21) every day for each key 
> consistently. One day, a change occurred where it started writing 15KB 
> (slab 23), causing a migration of data from one slab to another. We had -o 
> > slab_reassign,slab_automove=1 set on the server, causing large numbers 
> of evictions on the initial slab. Let's say the cache could hold the data 
> at 15KB per key, but the old data was not technically TTL'd out in it's old 
> slab. This means that memory was not being freed by the lru crawler thread 
> (I think) because its expiry had not come around.  
> > 
> > lines 1199 and 1200 in items.c: 
> > if ((search->exptime != 0 && search->exptime < current_time) || 
> is_flushed(search)) { 
> > 
> > If there was a check to see if this data was "orphaned," i.e. that the 
> key, if accessed, would map to a different slab than the current one, then 
> these orphans could be reclaimed as free memory. I am working on a patch to 
> do this, though I have reservations about performing a hash on the key on 
> the lru crawler thread (if the hash is not already available). 
> > I have very little experience in the memcached codebase so I don't know 
> the most efficient way to do this. Any help would be appreciated. 
>
> There seems to be a misconception about how the slab classes work. A key, 
> if already existing in a slab, will always map to the slab class it 
> currently fits into. The slab classes always exist, but the amount of 
> memory reserved for each of them will shift with the slab_reassign. ie: 10 
> pages in slab class 21, then memory pressure on 23 causes it to move over. 
>
> So if you examine a key that still exists in slab class 21, it has no 
> reason to move up or down the slab classes. 
>
> > Alternatively, and possibly more beneficial is compaction of data in a 
> slab using the same set of criteria as lru crawling. Understandably, 
> compaction is a very difficult problem to solve since moving the data would 
> be a pain in the ass. I saw a couple of discussions about this in the 
> mailing list, though I didn't see any firm thoughts about it. I think it 
> > can probably be done in O(1) like the lru crawler by limiting the number 
> of items it touches each time. Writing and reading are doable in O(1) so 
> moving should be as well. Has anyone given more thought on compaction? 
>
> I'd be interested in hacking this up for you folks if you can provide me 
> testing and some data to work with. With all of the LRU work I did in 
> 1.4.24, the next things I wanted to do is a big improvement on the slab 
> reassignment code. 
>
> Currently it picks essentially a random slab page, empties it, and moves 
> the slab page into the class under pressure. 
>
> One thing we can do is first examine for free memory in the existing slab, 
> IE: 
>
> - Take a page from slab 21 
> - Scan the page for valid items which need to be moved 
> - Pull free memory from slab 21, migrate the item (moderately complicated) 
> - When the page is empty, move it (or give up if you run out of free 
> chunks). 
>
> The next step is to pull from the LRU on slab 21: 
>
> - Take page from slab 21 
> - Scan page for valid items 
> - Pull free memory from slab 21, migrate the item 
>   - If no memory free, evict tail of slab 21. use that chunk. 
> - When the page is empty, move it. 
>
> Then, when you hit this condition your least-recently-used data gets 
> culled as new data migrates your page class. This should match a natural 
> occurrance if you would already be evicting valid (but old) items to make 
> room for new items. 
>
> A bonus to using the free memory trick, is that I can use the amount of 
> free space in a slab class as a heuristic to more quickly move slab pages 
> around. 
>
> If it's still necessary from there, we can explore "upgrading" items to a 
> new slab class, but that is much much more complicated since the item has 
> to shift LRU's. Do you put it at the head, the tail, the middle, etc? It 
> might be impossible to make a good generic decision there. 
>
> What version are you currently on? If 1.4.24, have you seen any 
> instability? I'm currently torn between fighting a few bugs and start on 
> improving the slab rebalancer. 
>
> -Dormando


On Saturday, July 11, 2015 at 12:05:54 PM UTC-7, Dormando wrote:
>
> Hey, 
>
> On Fri, 10 Jul 2015, Scott Mansfield wrote: 
>
> > We've seen issues recently where we run a cluster that typically has the 
> majority of items overwritten in the same slab every day and a sudden 
> change in data size evicts a ton of data, affecting downstream systems. To 
> be clear that is our problem, but I think there's a tweak in memcached that 
> might be useful and another possible feature that would be even 
> > better. 
> > The data that is written to this cache is overwritten every day, though 
> the TTL is 7 days. One slab takes up the majority of the space in the 
> cache. The application wrote e.g. 10KB (slab 21) every day for each key 
> consistently. One day, a change occurred where it started writing 15KB 
> (slab 23), causing a migration of data from one slab to another. We had -o 
> > slab_reassign,slab_automove=1 set on the server, causing large numbers 
> of evictions on the initial slab. Let's say the cache could hold the data 
> at 15KB per key, but the old data was not technically TTL'd out in it's old 
> slab. This means that memory was not being freed by the lru crawler thread 
> (I think) because its expiry had not come around.  
> > 
> > lines 1199 and 1200 in items.c: 
> > if ((search->exptime != 0 && search->exptime < current_time) || 
> is_flushed(search)) { 
> > 
> > If there was a check to see if this data was "orphaned," i.e. that the 
> key, if accessed, would map to a different slab than the current one, then 
> these orphans could be reclaimed as free memory. I am working on a patch to 
> do this, though I have reservations about performing a hash on the key on 
> the lru crawler thread (if the hash is not already available). 
> > I have very little experience in the memcached codebase so I don't know 
> the most efficient way to do this. Any help would be appreciated. 
>
> There seems to be a misconception about how the slab classes work. A key, 
> if already existing in a slab, will always map to the slab class it 
> currently fits into. The slab classes always exist, but the amount of 
> memory reserved for each of them will shift with the slab_reassign. ie: 10 
> pages in slab class 21, then memory pressure on 23 causes it to move over. 
>
> So if you examine a key that still exists in slab class 21, it has no 
> reason to move up or down the slab classes. 
>
> > Alternatively, and possibly more beneficial is compaction of data in a 
> slab using the same set of criteria as lru crawling. Understandably, 
> compaction is a very difficult problem to solve since moving the data would 
> be a pain in the ass. I saw a couple of discussions about this in the 
> mailing list, though I didn't see any firm thoughts about it. I think it 
> > can probably be done in O(1) like the lru crawler by limiting the number 
> of items it touches each time. Writing and reading are doable in O(1) so 
> moving should be as well. Has anyone given more thought on compaction? 
>
> I'd be interested in hacking this up for you folks if you can provide me 
> testing and some data to work with. With all of the LRU work I did in 
> 1.4.24, the next things I wanted to do is a big improvement on the slab 
> reassignment code. 
>
> Currently it picks essentially a random slab page, empties it, and moves 
> the slab page into the class under pressure. 
>
> One thing we can do is first examine for free memory in the existing slab, 
> IE: 
>
> - Take a page from slab 21 
> - Scan the page for valid items which need to be moved 
> - Pull free memory from slab 21, migrate the item (moderately complicated) 
> - When the page is empty, move it (or give up if you run out of free 
> chunks). 
>
> The next step is to pull from the LRU on slab 21: 
>
> - Take page from slab 21 
> - Scan page for valid items 
> - Pull free memory from slab 21, migrate the item 
>   - If no memory free, evict tail of slab 21. use that chunk. 
> - When the page is empty, move it. 
>
> Then, when you hit this condition your least-recently-used data gets 
> culled as new data migrates your page class. This should match a natural 
> occurrance if you would already be evicting valid (but old) items to make 
> room for new items. 
>
> A bonus to using the free memory trick, is that I can use the amount of 
> free space in a slab class as a heuristic to more quickly move slab pages 
> around. 
>
> If it's still necessary from there, we can explore "upgrading" items to a 
> new slab class, but that is much much more complicated since the item has 
> to shift LRU's. Do you put it at the head, the tail, the middle, etc? It 
> might be impossible to make a good generic decision there. 
>
> What version are you currently on? If 1.4.24, have you seen any 
> instability? I'm currently torn between fighting a few bugs and start on 
> improving the slab rebalancer. 
>
> -Dormando

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Check for orphaned items in lru crawler thread

Reply via email to