Re: [PATCH -mm v3 8/8] slab: do not keep free objects/slabs on dead memcg caches
On Mon, Jun 30, 2014 at 10:49:03AM -0500, Christoph Lameter wrote: > On Fri, 27 Jun 2014, Joonsoo Kim wrote: > > > Christoph, > > Is it tolerable result for large scale system? Or do we need to find > > another solution? > > > The overhead is pretty intense but then this is a rare event I guess? Yes, provided cgroups are created/destroyed rarely. > It seems that it is much easier on the code and much faster to do the > periodic reaping. Why not simply go with that? A bad thing about the periodic reaping is that the time it may take isn't predictable, because the number of dead caches is, in fact, only limited by the amount of RAM. We can have hundreds, if not thousands, copies of dcaches/icaches left from cgroups destroyed some time ago. The dead caches will hang around until memory pressure evicts all the objects they host, which may take quite long on systems with a lot of memory. With periodic reaping, we will have to iterate over all dead caches trying to drain per cpu/node arrays each time, which might therefore result in slowing down the whole system unexpectedly. I'm not quite sure if such slowdowns are really a threat though. Actually, cache_reap will only do something (take locks, drain arrays/lists) only if there are free objects on the cache. Otherwise it will, in fact, only check cpu_cache->avail, alien->avail, shared->avail, and node->free_list, which shouldn't take much time, should it? Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm v3 8/8] slab: do not keep free objects/slabs on dead memcg caches
On Mon, Jun 30, 2014 at 10:49:03AM -0500, Christoph Lameter wrote: On Fri, 27 Jun 2014, Joonsoo Kim wrote: Christoph, Is it tolerable result for large scale system? Or do we need to find another solution? The overhead is pretty intense but then this is a rare event I guess? Yes, provided cgroups are created/destroyed rarely. It seems that it is much easier on the code and much faster to do the periodic reaping. Why not simply go with that? A bad thing about the periodic reaping is that the time it may take isn't predictable, because the number of dead caches is, in fact, only limited by the amount of RAM. We can have hundreds, if not thousands, copies of dcaches/icaches left from cgroups destroyed some time ago. The dead caches will hang around until memory pressure evicts all the objects they host, which may take quite long on systems with a lot of memory. With periodic reaping, we will have to iterate over all dead caches trying to drain per cpu/node arrays each time, which might therefore result in slowing down the whole system unexpectedly. I'm not quite sure if such slowdowns are really a threat though. Actually, cache_reap will only do something (take locks, drain arrays/lists) only if there are free objects on the cache. Otherwise it will, in fact, only check cpu_cache-avail, alien-avail, shared-avail, and node-free_list, which shouldn't take much time, should it? Thanks. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm v3 8/8] slab: do not keep free objects/slabs on dead memcg caches
On Fri, 27 Jun 2014, Joonsoo Kim wrote: > Christoph, > Is it tolerable result for large scale system? Or do we need to find > another solution? The overhead is pretty intense but then this is a rare event I guess? It seems that it is much easier on the code and much faster to do the periodic reaping. Why not simply go with that? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm v3 8/8] slab: do not keep free objects/slabs on dead memcg caches
On Fri, 27 Jun 2014, Joonsoo Kim wrote: Christoph, Is it tolerable result for large scale system? Or do we need to find another solution? The overhead is pretty intense but then this is a rare event I guess? It seems that it is much easier on the code and much faster to do the periodic reaping. Why not simply go with that? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm v3 8/8] slab: do not keep free objects/slabs on dead memcg caches
On Wed, Jun 25, 2014 at 05:45:45PM +0400, Vladimir Davydov wrote: > On Tue, Jun 24, 2014 at 04:38:41PM +0900, Joonsoo Kim wrote: > > On Fri, Jun 13, 2014 at 12:38:22AM +0400, Vladimir Davydov wrote: > > And, you said that this way of implementation would be slow because > > there could be many object in dead caches and this implementation > > needs node spin_lock on each object freeing. Is it no problem now? > > > > If you have any performance data about this implementation and > > alternative one, could you share it? > > I ran some tests on a 2 CPU x 6 core x 2 HT box. The kernel was compiled > with a config taken from a popular distro, so it had most of debug > options turned off. > > --- > > TEST #1: Each logical CPU executes a task that frees 1M objects > allocated from the same cache. All frees are node-local. > > RESULTS: > > objsize (bytes) | cache is dead? | objects free time (ms) > ++--- > 64| -| 373 +- 5 >-| +| 1300 +- 6 > || > 128| -| 387 +- 6 >-| +| 1337 +- 6 > || > 256| -| 484 +- 4 >-| +| 1407 +- 6 > || > 512| -| 686 +- 5 >-| +| 1561 +- 18 > || > 1024| -| 1073 +- 11 >-| +| 1897 +- 12 > > TEST #2: Each logical CPU executes a task that removes 1M empty files > from its own RAMFS mount. All frees are node-local. > > RESULTS: > > cache is dead? | files removal time (s) > +-- > - | 15.57 +- 0.55 (base) > + | 16.80 +- 0.62 (base + 8%) > > --- > > So, according to TEST #1 the relative slowdown introduced by zapping per > cpu arrays is really dreadful - it can be up to 4x! However, the > absolute numbers aren't that huge - ~1 second for 24 million objects. > If we do something else except kfree the slowdown shouldn't be that > visible IMO. > > TEST #2 is an attempt to estimate how zapping of per cpu arrays will > affect FS objects destruction, which is the most common case of dead > caches usage. To avoid disk-bound operations it uses RAMFS. From the > test results it follows that the relative slowdown of massive file > deletion is within 2 stdev, which looks decent. > > Anyway, the alternative approach (reaping dead caches periodically) > won't have this kfree slowdown at all. However, periodic reaping can > become a real disaster as the system evolves and the number of dead > caches grows. Currently I don't know how we can estimate real life > effects of this. If you have any ideas, please let me know. > Hello, I have no idea here. I don't have much experience on large scale system. But, current implementation would also have big trouble if system is larger than yours. I think that Christoph can say something about this result. Christoph, Is it tolerable result for large scale system? Or do we need to find another solution? Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm v3 8/8] slab: do not keep free objects/slabs on dead memcg caches
On Wed, Jun 25, 2014 at 05:45:45PM +0400, Vladimir Davydov wrote: On Tue, Jun 24, 2014 at 04:38:41PM +0900, Joonsoo Kim wrote: On Fri, Jun 13, 2014 at 12:38:22AM +0400, Vladimir Davydov wrote: And, you said that this way of implementation would be slow because there could be many object in dead caches and this implementation needs node spin_lock on each object freeing. Is it no problem now? If you have any performance data about this implementation and alternative one, could you share it? I ran some tests on a 2 CPU x 6 core x 2 HT box. The kernel was compiled with a config taken from a popular distro, so it had most of debug options turned off. --- TEST #1: Each logical CPU executes a task that frees 1M objects allocated from the same cache. All frees are node-local. RESULTS: objsize (bytes) | cache is dead? | objects free time (ms) ++--- 64| -| 373 +- 5 -| +| 1300 +- 6 || 128| -| 387 +- 6 -| +| 1337 +- 6 || 256| -| 484 +- 4 -| +| 1407 +- 6 || 512| -| 686 +- 5 -| +| 1561 +- 18 || 1024| -| 1073 +- 11 -| +| 1897 +- 12 TEST #2: Each logical CPU executes a task that removes 1M empty files from its own RAMFS mount. All frees are node-local. RESULTS: cache is dead? | files removal time (s) +-- - | 15.57 +- 0.55 (base) + | 16.80 +- 0.62 (base + 8%) --- So, according to TEST #1 the relative slowdown introduced by zapping per cpu arrays is really dreadful - it can be up to 4x! However, the absolute numbers aren't that huge - ~1 second for 24 million objects. If we do something else except kfree the slowdown shouldn't be that visible IMO. TEST #2 is an attempt to estimate how zapping of per cpu arrays will affect FS objects destruction, which is the most common case of dead caches usage. To avoid disk-bound operations it uses RAMFS. From the test results it follows that the relative slowdown of massive file deletion is within 2 stdev, which looks decent. Anyway, the alternative approach (reaping dead caches periodically) won't have this kfree slowdown at all. However, periodic reaping can become a real disaster as the system evolves and the number of dead caches grows. Currently I don't know how we can estimate real life effects of this. If you have any ideas, please let me know. Hello, I have no idea here. I don't have much experience on large scale system. But, current implementation would also have big trouble if system is larger than yours. I think that Christoph can say something about this result. Christoph, Is it tolerable result for large scale system? Or do we need to find another solution? Thanks. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm v3 8/8] slab: do not keep free objects/slabs on dead memcg caches
On Tue, Jun 24, 2014 at 04:38:41PM +0900, Joonsoo Kim wrote: > On Fri, Jun 13, 2014 at 12:38:22AM +0400, Vladimir Davydov wrote: > And, you said that this way of implementation would be slow because > there could be many object in dead caches and this implementation > needs node spin_lock on each object freeing. Is it no problem now? > > If you have any performance data about this implementation and > alternative one, could you share it? I ran some tests on a 2 CPU x 6 core x 2 HT box. The kernel was compiled with a config taken from a popular distro, so it had most of debug options turned off. --- TEST #1: Each logical CPU executes a task that frees 1M objects allocated from the same cache. All frees are node-local. RESULTS: objsize (bytes) | cache is dead? | objects free time (ms) ++--- 64| -| 373 +- 5 -| +| 1300 +- 6 || 128| -| 387 +- 6 -| +| 1337 +- 6 || 256| -| 484 +- 4 -| +| 1407 +- 6 || 512| -| 686 +- 5 -| +| 1561 +- 18 || 1024| -| 1073 +- 11 -| +| 1897 +- 12 TEST #2: Each logical CPU executes a task that removes 1M empty files from its own RAMFS mount. All frees are node-local. RESULTS: cache is dead? | files removal time (s) +-- - | 15.57 +- 0.55 (base) + | 16.80 +- 0.62 (base + 8%) --- So, according to TEST #1 the relative slowdown introduced by zapping per cpu arrays is really dreadful - it can be up to 4x! However, the absolute numbers aren't that huge - ~1 second for 24 million objects. If we do something else except kfree the slowdown shouldn't be that visible IMO. TEST #2 is an attempt to estimate how zapping of per cpu arrays will affect FS objects destruction, which is the most common case of dead caches usage. To avoid disk-bound operations it uses RAMFS. From the test results it follows that the relative slowdown of massive file deletion is within 2 stdev, which looks decent. Anyway, the alternative approach (reaping dead caches periodically) won't have this kfree slowdown at all. However, periodic reaping can become a real disaster as the system evolves and the number of dead caches grows. Currently I don't know how we can estimate real life effects of this. If you have any ideas, please let me know. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm v3 8/8] slab: do not keep free objects/slabs on dead memcg caches
On Tue, Jun 24, 2014 at 04:38:41PM +0900, Joonsoo Kim wrote: On Fri, Jun 13, 2014 at 12:38:22AM +0400, Vladimir Davydov wrote: And, you said that this way of implementation would be slow because there could be many object in dead caches and this implementation needs node spin_lock on each object freeing. Is it no problem now? If you have any performance data about this implementation and alternative one, could you share it? I ran some tests on a 2 CPU x 6 core x 2 HT box. The kernel was compiled with a config taken from a popular distro, so it had most of debug options turned off. --- TEST #1: Each logical CPU executes a task that frees 1M objects allocated from the same cache. All frees are node-local. RESULTS: objsize (bytes) | cache is dead? | objects free time (ms) ++--- 64| -| 373 +- 5 -| +| 1300 +- 6 || 128| -| 387 +- 6 -| +| 1337 +- 6 || 256| -| 484 +- 4 -| +| 1407 +- 6 || 512| -| 686 +- 5 -| +| 1561 +- 18 || 1024| -| 1073 +- 11 -| +| 1897 +- 12 TEST #2: Each logical CPU executes a task that removes 1M empty files from its own RAMFS mount. All frees are node-local. RESULTS: cache is dead? | files removal time (s) +-- - | 15.57 +- 0.55 (base) + | 16.80 +- 0.62 (base + 8%) --- So, according to TEST #1 the relative slowdown introduced by zapping per cpu arrays is really dreadful - it can be up to 4x! However, the absolute numbers aren't that huge - ~1 second for 24 million objects. If we do something else except kfree the slowdown shouldn't be that visible IMO. TEST #2 is an attempt to estimate how zapping of per cpu arrays will affect FS objects destruction, which is the most common case of dead caches usage. To avoid disk-bound operations it uses RAMFS. From the test results it follows that the relative slowdown of massive file deletion is within 2 stdev, which looks decent. Anyway, the alternative approach (reaping dead caches periodically) won't have this kfree slowdown at all. However, periodic reaping can become a real disaster as the system evolves and the number of dead caches grows. Currently I don't know how we can estimate real life effects of this. If you have any ideas, please let me know. Thanks. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm v3 8/8] slab: do not keep free objects/slabs on dead memcg caches
On Tue, Jun 24, 2014 at 04:38:41PM +0900, Joonsoo Kim wrote: > On Fri, Jun 13, 2014 at 12:38:22AM +0400, Vladimir Davydov wrote: > > @@ -3462,6 +3474,17 @@ static inline void __cache_free(struct kmem_cache > > *cachep, void *objp, > > > > kmemcheck_slab_free(cachep, objp, cachep->object_size); > > > > +#ifdef CONFIG_MEMCG_KMEM > > + if (unlikely(!ac)) { > > + int nodeid = page_to_nid(virt_to_page(objp)); > > + > > + spin_lock(>node[nodeid]->list_lock); > > + free_block(cachep, , 1, nodeid); > > + spin_unlock(>node[nodeid]->list_lock); > > + return; > > + } > > +#endif > > + > > And, please document intention of this code. :) Sure. > And, you said that this way of implementation would be slow because > there could be many object in dead caches and this implementation > needs node spin_lock on each object freeing. Is it no problem now? It may be :( > If you have any performance data about this implementation and > alternative one, could you share it? I haven't (shame on me!). I'll do some testing today and send you the results. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm v3 8/8] slab: do not keep free objects/slabs on dead memcg caches
Hi, On Tue, Jun 24, 2014 at 04:25:54PM +0900, Joonsoo Kim wrote: > On Fri, Jun 13, 2014 at 12:38:22AM +0400, Vladimir Davydov wrote: > > @@ -3368,7 +3379,8 @@ static void free_block(struct kmem_cache *cachep, > > void **objpp, int nr_objects, > > > > /* fixup slab chains */ > > if (page->active == 0) { > > - if (n->free_objects > n->free_limit) { > > + if (n->free_objects > n->free_limit || > > + memcg_cache_dead(cachep)) { > > I'd like to set 0 to free_limit in __kmem_cache_shrink() > rather than memcg_cache_dead() test here, because memcg_cache_dead() > is more expensive than it. Is there any problem in this way? We'd have to be careful on cpu hotplug then, because it may update the free_limit. Not a big problem though. Will fix. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm v3 8/8] slab: do not keep free objects/slabs on dead memcg caches
On Fri, Jun 13, 2014 at 12:38:22AM +0400, Vladimir Davydov wrote: > Since a dead memcg cache is destroyed only after the last slab allocated > to it is freed, we must disable caching of free objects/slabs for such > caches, otherwise they will be hanging around forever. > > For SLAB that means we must disable per cpu free object arrays and make > free_block always discard empty slabs irrespective of node's free_limit. > > To disable per cpu arrays, we free them on kmem_cache_shrink (see > drain_cpu_caches -> do_drain) and make __cache_free fall back to > free_block if there is no per cpu array. Also, we have to disable > allocation of per cpu arrays on cpu hotplug for dead caches (see > cpuup_prepare, __do_tune_cpucache). > > After we disabled free objects/slabs caching, there is no need to reap > those caches periodically. Moreover, it will only result in slowdown. So > we also make cache_reap skip then. > > Signed-off-by: Vladimir Davydov > --- > mm/slab.c | 31 ++- > 1 file changed, 30 insertions(+), 1 deletion(-) > > diff --git a/mm/slab.c b/mm/slab.c > index b3af82419251..7e91f5f1341d 100644 > --- a/mm/slab.c > +++ b/mm/slab.c > @@ -1210,6 +1210,9 @@ static int cpuup_prepare(long cpu) > struct array_cache *shared = NULL; > struct array_cache **alien = NULL; > > + if (memcg_cache_dead(cachep)) > + continue; > + > nc = alloc_arraycache(node, cachep->limit, > cachep->batchcount, GFP_KERNEL); > if (!nc) > @@ -2411,10 +2414,18 @@ static void do_drain(void *arg) > > check_irq_off(); > ac = cpu_cache_get(cachep); > + if (!ac) > + return; > + > spin_lock(>node[node]->list_lock); > free_block(cachep, ac->entry, ac->avail, node); > spin_unlock(>node[node]->list_lock); > ac->avail = 0; > + > + if (memcg_cache_dead(cachep)) { > + cachep->array[smp_processor_id()] = NULL; > + kfree(ac); > + } > } > > static void drain_cpu_caches(struct kmem_cache *cachep) > @@ -3368,7 +3379,8 @@ static void free_block(struct kmem_cache *cachep, void > **objpp, int nr_objects, > > /* fixup slab chains */ > if (page->active == 0) { > - if (n->free_objects > n->free_limit) { > + if (n->free_objects > n->free_limit || > + memcg_cache_dead(cachep)) { > n->free_objects -= cachep->num; > /* No need to drop any previously held >* lock here, even if we have a off-slab slab > @@ -3462,6 +3474,17 @@ static inline void __cache_free(struct kmem_cache > *cachep, void *objp, > > kmemcheck_slab_free(cachep, objp, cachep->object_size); > > +#ifdef CONFIG_MEMCG_KMEM > + if (unlikely(!ac)) { > + int nodeid = page_to_nid(virt_to_page(objp)); > + > + spin_lock(>node[nodeid]->list_lock); > + free_block(cachep, , 1, nodeid); > + spin_unlock(>node[nodeid]->list_lock); > + return; > + } > +#endif > + And, please document intention of this code. :) And, you said that this way of implementation would be slow because there could be many object in dead caches and this implementation needs node spin_lock on each object freeing. Is it no problem now? If you have any performance data about this implementation and alternative one, could you share it? Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm v3 8/8] slab: do not keep free objects/slabs on dead memcg caches
On Fri, Jun 13, 2014 at 12:38:22AM +0400, Vladimir Davydov wrote: > Since a dead memcg cache is destroyed only after the last slab allocated > to it is freed, we must disable caching of free objects/slabs for such > caches, otherwise they will be hanging around forever. > > For SLAB that means we must disable per cpu free object arrays and make > free_block always discard empty slabs irrespective of node's free_limit. > > To disable per cpu arrays, we free them on kmem_cache_shrink (see > drain_cpu_caches -> do_drain) and make __cache_free fall back to > free_block if there is no per cpu array. Also, we have to disable > allocation of per cpu arrays on cpu hotplug for dead caches (see > cpuup_prepare, __do_tune_cpucache). > > After we disabled free objects/slabs caching, there is no need to reap > those caches periodically. Moreover, it will only result in slowdown. So > we also make cache_reap skip then. > > Signed-off-by: Vladimir Davydov > --- > mm/slab.c | 31 ++- > 1 file changed, 30 insertions(+), 1 deletion(-) > > diff --git a/mm/slab.c b/mm/slab.c > index b3af82419251..7e91f5f1341d 100644 > --- a/mm/slab.c > +++ b/mm/slab.c > @@ -1210,6 +1210,9 @@ static int cpuup_prepare(long cpu) > struct array_cache *shared = NULL; > struct array_cache **alien = NULL; > > + if (memcg_cache_dead(cachep)) > + continue; > + > nc = alloc_arraycache(node, cachep->limit, > cachep->batchcount, GFP_KERNEL); > if (!nc) > @@ -2411,10 +2414,18 @@ static void do_drain(void *arg) > > check_irq_off(); > ac = cpu_cache_get(cachep); > + if (!ac) > + return; > + > spin_lock(>node[node]->list_lock); > free_block(cachep, ac->entry, ac->avail, node); > spin_unlock(>node[node]->list_lock); > ac->avail = 0; > + > + if (memcg_cache_dead(cachep)) { > + cachep->array[smp_processor_id()] = NULL; > + kfree(ac); > + } > } > > static void drain_cpu_caches(struct kmem_cache *cachep) > @@ -3368,7 +3379,8 @@ static void free_block(struct kmem_cache *cachep, void > **objpp, int nr_objects, > > /* fixup slab chains */ > if (page->active == 0) { > - if (n->free_objects > n->free_limit) { > + if (n->free_objects > n->free_limit || > + memcg_cache_dead(cachep)) { Hello, Vladimir. I'd like to set 0 to free_limit in __kmem_cache_shrink() rather than memcg_cache_dead() test here, because memcg_cache_dead() is more expensive than it. Is there any problem in this way? Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm v3 8/8] slab: do not keep free objects/slabs on dead memcg caches
On Fri, Jun 13, 2014 at 12:38:22AM +0400, Vladimir Davydov wrote: Since a dead memcg cache is destroyed only after the last slab allocated to it is freed, we must disable caching of free objects/slabs for such caches, otherwise they will be hanging around forever. For SLAB that means we must disable per cpu free object arrays and make free_block always discard empty slabs irrespective of node's free_limit. To disable per cpu arrays, we free them on kmem_cache_shrink (see drain_cpu_caches - do_drain) and make __cache_free fall back to free_block if there is no per cpu array. Also, we have to disable allocation of per cpu arrays on cpu hotplug for dead caches (see cpuup_prepare, __do_tune_cpucache). After we disabled free objects/slabs caching, there is no need to reap those caches periodically. Moreover, it will only result in slowdown. So we also make cache_reap skip then. Signed-off-by: Vladimir Davydov vdavy...@parallels.com --- mm/slab.c | 31 ++- 1 file changed, 30 insertions(+), 1 deletion(-) diff --git a/mm/slab.c b/mm/slab.c index b3af82419251..7e91f5f1341d 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -1210,6 +1210,9 @@ static int cpuup_prepare(long cpu) struct array_cache *shared = NULL; struct array_cache **alien = NULL; + if (memcg_cache_dead(cachep)) + continue; + nc = alloc_arraycache(node, cachep-limit, cachep-batchcount, GFP_KERNEL); if (!nc) @@ -2411,10 +2414,18 @@ static void do_drain(void *arg) check_irq_off(); ac = cpu_cache_get(cachep); + if (!ac) + return; + spin_lock(cachep-node[node]-list_lock); free_block(cachep, ac-entry, ac-avail, node); spin_unlock(cachep-node[node]-list_lock); ac-avail = 0; + + if (memcg_cache_dead(cachep)) { + cachep-array[smp_processor_id()] = NULL; + kfree(ac); + } } static void drain_cpu_caches(struct kmem_cache *cachep) @@ -3368,7 +3379,8 @@ static void free_block(struct kmem_cache *cachep, void **objpp, int nr_objects, /* fixup slab chains */ if (page-active == 0) { - if (n-free_objects n-free_limit) { + if (n-free_objects n-free_limit || + memcg_cache_dead(cachep)) { Hello, Vladimir. I'd like to set 0 to free_limit in __kmem_cache_shrink() rather than memcg_cache_dead() test here, because memcg_cache_dead() is more expensive than it. Is there any problem in this way? Thanks. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm v3 8/8] slab: do not keep free objects/slabs on dead memcg caches
On Fri, Jun 13, 2014 at 12:38:22AM +0400, Vladimir Davydov wrote: Since a dead memcg cache is destroyed only after the last slab allocated to it is freed, we must disable caching of free objects/slabs for such caches, otherwise they will be hanging around forever. For SLAB that means we must disable per cpu free object arrays and make free_block always discard empty slabs irrespective of node's free_limit. To disable per cpu arrays, we free them on kmem_cache_shrink (see drain_cpu_caches - do_drain) and make __cache_free fall back to free_block if there is no per cpu array. Also, we have to disable allocation of per cpu arrays on cpu hotplug for dead caches (see cpuup_prepare, __do_tune_cpucache). After we disabled free objects/slabs caching, there is no need to reap those caches periodically. Moreover, it will only result in slowdown. So we also make cache_reap skip then. Signed-off-by: Vladimir Davydov vdavy...@parallels.com --- mm/slab.c | 31 ++- 1 file changed, 30 insertions(+), 1 deletion(-) diff --git a/mm/slab.c b/mm/slab.c index b3af82419251..7e91f5f1341d 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -1210,6 +1210,9 @@ static int cpuup_prepare(long cpu) struct array_cache *shared = NULL; struct array_cache **alien = NULL; + if (memcg_cache_dead(cachep)) + continue; + nc = alloc_arraycache(node, cachep-limit, cachep-batchcount, GFP_KERNEL); if (!nc) @@ -2411,10 +2414,18 @@ static void do_drain(void *arg) check_irq_off(); ac = cpu_cache_get(cachep); + if (!ac) + return; + spin_lock(cachep-node[node]-list_lock); free_block(cachep, ac-entry, ac-avail, node); spin_unlock(cachep-node[node]-list_lock); ac-avail = 0; + + if (memcg_cache_dead(cachep)) { + cachep-array[smp_processor_id()] = NULL; + kfree(ac); + } } static void drain_cpu_caches(struct kmem_cache *cachep) @@ -3368,7 +3379,8 @@ static void free_block(struct kmem_cache *cachep, void **objpp, int nr_objects, /* fixup slab chains */ if (page-active == 0) { - if (n-free_objects n-free_limit) { + if (n-free_objects n-free_limit || + memcg_cache_dead(cachep)) { n-free_objects -= cachep-num; /* No need to drop any previously held * lock here, even if we have a off-slab slab @@ -3462,6 +3474,17 @@ static inline void __cache_free(struct kmem_cache *cachep, void *objp, kmemcheck_slab_free(cachep, objp, cachep-object_size); +#ifdef CONFIG_MEMCG_KMEM + if (unlikely(!ac)) { + int nodeid = page_to_nid(virt_to_page(objp)); + + spin_lock(cachep-node[nodeid]-list_lock); + free_block(cachep, objp, 1, nodeid); + spin_unlock(cachep-node[nodeid]-list_lock); + return; + } +#endif + And, please document intention of this code. :) And, you said that this way of implementation would be slow because there could be many object in dead caches and this implementation needs node spin_lock on each object freeing. Is it no problem now? If you have any performance data about this implementation and alternative one, could you share it? Thanks. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm v3 8/8] slab: do not keep free objects/slabs on dead memcg caches
Hi, On Tue, Jun 24, 2014 at 04:25:54PM +0900, Joonsoo Kim wrote: On Fri, Jun 13, 2014 at 12:38:22AM +0400, Vladimir Davydov wrote: @@ -3368,7 +3379,8 @@ static void free_block(struct kmem_cache *cachep, void **objpp, int nr_objects, /* fixup slab chains */ if (page-active == 0) { - if (n-free_objects n-free_limit) { + if (n-free_objects n-free_limit || + memcg_cache_dead(cachep)) { I'd like to set 0 to free_limit in __kmem_cache_shrink() rather than memcg_cache_dead() test here, because memcg_cache_dead() is more expensive than it. Is there any problem in this way? We'd have to be careful on cpu hotplug then, because it may update the free_limit. Not a big problem though. Will fix. Thanks. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm v3 8/8] slab: do not keep free objects/slabs on dead memcg caches
On Tue, Jun 24, 2014 at 04:38:41PM +0900, Joonsoo Kim wrote: On Fri, Jun 13, 2014 at 12:38:22AM +0400, Vladimir Davydov wrote: @@ -3462,6 +3474,17 @@ static inline void __cache_free(struct kmem_cache *cachep, void *objp, kmemcheck_slab_free(cachep, objp, cachep-object_size); +#ifdef CONFIG_MEMCG_KMEM + if (unlikely(!ac)) { + int nodeid = page_to_nid(virt_to_page(objp)); + + spin_lock(cachep-node[nodeid]-list_lock); + free_block(cachep, objp, 1, nodeid); + spin_unlock(cachep-node[nodeid]-list_lock); + return; + } +#endif + And, please document intention of this code. :) Sure. And, you said that this way of implementation would be slow because there could be many object in dead caches and this implementation needs node spin_lock on each object freeing. Is it no problem now? It may be :( If you have any performance data about this implementation and alternative one, could you share it? I haven't (shame on me!). I'll do some testing today and send you the results. Thanks. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm v3 8/8] slab: do not keep free objects/slabs on dead memcg caches
On Fri, Jun 13, 2014 at 12:38:22AM +0400, Vladimir Davydov wrote: > Since a dead memcg cache is destroyed only after the last slab allocated > to it is freed, we must disable caching of free objects/slabs for such > caches, otherwise they will be hanging around forever. > > For SLAB that means we must disable per cpu free object arrays and make > free_block always discard empty slabs irrespective of node's free_limit. An alternative to this could be making cache_reap, which drains per cpu arrays and drops free slabs periodically for all caches, shrink dead caches aggressively. The patch doing this is attached. This approach has its pros and cons comparing to disabling per cpu arrays. Pros: - Less intrusive: it only requires modification of cache_reap. - Doesn't impact performance: free path isn't touched. Cons: - Delays dead cache destruction: lag between the last object is freed and the cache is destroyed isn't constant. It depends on the number of kmem-active memcgs and the number of dead caches (the more of them, the longer it'll take to shrink dead caches). Also, on NUMA machines the upper bound will be proportional to the number of NUMA nodes, because alien caches are reaped one at a time (see reap_alien). - If there are a lot of dead caches, periodic shrinking will be slowed down even for active caches (see cache_reap). -- diff --git a/mm/slab.c b/mm/slab.c index 9ca3b87edabc..811fdb214b9e 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -3980,6 +3980,11 @@ static void cache_reap(struct work_struct *w) goto out; list_for_each_entry(searchp, _caches, list) { + int force = 0; + + if (memcg_cache_dead(searchp)) + force = 1; + check_irq_on(); /* @@ -3991,7 +3996,7 @@ static void cache_reap(struct work_struct *w) reap_alien(searchp, n); - drain_array(searchp, n, cpu_cache_get(searchp), 0, node); + drain_array(searchp, n, cpu_cache_get(searchp), force, node); /* * These are racy checks but it does not matter @@ -4002,15 +4007,17 @@ static void cache_reap(struct work_struct *w) n->next_reap = jiffies + REAPTIMEOUT_NODE; - drain_array(searchp, n, n->shared, 0, node); + drain_array(searchp, n, n->shared, force, node); if (n->free_touched) n->free_touched = 0; else { - int freed; + int freed, tofree; + + tofree = force ? slabs_tofree(searchp, n) : + DIV_ROUND_UP(n->free_limit, 5 * searchp->num); - freed = drain_freelist(searchp, n, (n->free_limit + - 5 * searchp->num - 1) / (5 * searchp->num)); + freed = drain_freelist(searchp, n, tofree); STATS_ADD_REAPED(searchp, freed); } next: -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -mm v3 8/8] slab: do not keep free objects/slabs on dead memcg caches
Since a dead memcg cache is destroyed only after the last slab allocated to it is freed, we must disable caching of free objects/slabs for such caches, otherwise they will be hanging around forever. For SLAB that means we must disable per cpu free object arrays and make free_block always discard empty slabs irrespective of node's free_limit. To disable per cpu arrays, we free them on kmem_cache_shrink (see drain_cpu_caches -> do_drain) and make __cache_free fall back to free_block if there is no per cpu array. Also, we have to disable allocation of per cpu arrays on cpu hotplug for dead caches (see cpuup_prepare, __do_tune_cpucache). After we disabled free objects/slabs caching, there is no need to reap those caches periodically. Moreover, it will only result in slowdown. So we also make cache_reap skip then. Signed-off-by: Vladimir Davydov --- mm/slab.c | 31 ++- 1 file changed, 30 insertions(+), 1 deletion(-) diff --git a/mm/slab.c b/mm/slab.c index b3af82419251..7e91f5f1341d 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -1210,6 +1210,9 @@ static int cpuup_prepare(long cpu) struct array_cache *shared = NULL; struct array_cache **alien = NULL; + if (memcg_cache_dead(cachep)) + continue; + nc = alloc_arraycache(node, cachep->limit, cachep->batchcount, GFP_KERNEL); if (!nc) @@ -2411,10 +2414,18 @@ static void do_drain(void *arg) check_irq_off(); ac = cpu_cache_get(cachep); + if (!ac) + return; + spin_lock(>node[node]->list_lock); free_block(cachep, ac->entry, ac->avail, node); spin_unlock(>node[node]->list_lock); ac->avail = 0; + + if (memcg_cache_dead(cachep)) { + cachep->array[smp_processor_id()] = NULL; + kfree(ac); + } } static void drain_cpu_caches(struct kmem_cache *cachep) @@ -3368,7 +3379,8 @@ static void free_block(struct kmem_cache *cachep, void **objpp, int nr_objects, /* fixup slab chains */ if (page->active == 0) { - if (n->free_objects > n->free_limit) { + if (n->free_objects > n->free_limit || + memcg_cache_dead(cachep)) { n->free_objects -= cachep->num; /* No need to drop any previously held * lock here, even if we have a off-slab slab @@ -3462,6 +3474,17 @@ static inline void __cache_free(struct kmem_cache *cachep, void *objp, kmemcheck_slab_free(cachep, objp, cachep->object_size); +#ifdef CONFIG_MEMCG_KMEM + if (unlikely(!ac)) { + int nodeid = page_to_nid(virt_to_page(objp)); + + spin_lock(>node[nodeid]->list_lock); + free_block(cachep, , 1, nodeid); + spin_unlock(>node[nodeid]->list_lock); + return; + } +#endif + /* * Skip calling cache_free_alien() when the platform is not numa. * This will avoid cache misses that happen while accessing slabp (which @@ -3803,6 +3826,9 @@ static int __do_tune_cpucache(struct kmem_cache *cachep, int limit, struct ccupdate_struct *new; int i; + if (memcg_cache_dead(cachep)) + return 0; + new = kzalloc(sizeof(*new) + nr_cpu_ids * sizeof(struct array_cache *), gfp); if (!new) @@ -3988,6 +4014,9 @@ static void cache_reap(struct work_struct *w) list_for_each_entry(searchp, _caches, list) { check_irq_on(); + if (memcg_cache_dead(searchp)) + continue; + /* * We only take the node lock if absolutely necessary and we * have established with reasonable certainty that -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -mm v3 8/8] slab: do not keep free objects/slabs on dead memcg caches
Since a dead memcg cache is destroyed only after the last slab allocated to it is freed, we must disable caching of free objects/slabs for such caches, otherwise they will be hanging around forever. For SLAB that means we must disable per cpu free object arrays and make free_block always discard empty slabs irrespective of node's free_limit. To disable per cpu arrays, we free them on kmem_cache_shrink (see drain_cpu_caches - do_drain) and make __cache_free fall back to free_block if there is no per cpu array. Also, we have to disable allocation of per cpu arrays on cpu hotplug for dead caches (see cpuup_prepare, __do_tune_cpucache). After we disabled free objects/slabs caching, there is no need to reap those caches periodically. Moreover, it will only result in slowdown. So we also make cache_reap skip then. Signed-off-by: Vladimir Davydov vdavy...@parallels.com --- mm/slab.c | 31 ++- 1 file changed, 30 insertions(+), 1 deletion(-) diff --git a/mm/slab.c b/mm/slab.c index b3af82419251..7e91f5f1341d 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -1210,6 +1210,9 @@ static int cpuup_prepare(long cpu) struct array_cache *shared = NULL; struct array_cache **alien = NULL; + if (memcg_cache_dead(cachep)) + continue; + nc = alloc_arraycache(node, cachep-limit, cachep-batchcount, GFP_KERNEL); if (!nc) @@ -2411,10 +2414,18 @@ static void do_drain(void *arg) check_irq_off(); ac = cpu_cache_get(cachep); + if (!ac) + return; + spin_lock(cachep-node[node]-list_lock); free_block(cachep, ac-entry, ac-avail, node); spin_unlock(cachep-node[node]-list_lock); ac-avail = 0; + + if (memcg_cache_dead(cachep)) { + cachep-array[smp_processor_id()] = NULL; + kfree(ac); + } } static void drain_cpu_caches(struct kmem_cache *cachep) @@ -3368,7 +3379,8 @@ static void free_block(struct kmem_cache *cachep, void **objpp, int nr_objects, /* fixup slab chains */ if (page-active == 0) { - if (n-free_objects n-free_limit) { + if (n-free_objects n-free_limit || + memcg_cache_dead(cachep)) { n-free_objects -= cachep-num; /* No need to drop any previously held * lock here, even if we have a off-slab slab @@ -3462,6 +3474,17 @@ static inline void __cache_free(struct kmem_cache *cachep, void *objp, kmemcheck_slab_free(cachep, objp, cachep-object_size); +#ifdef CONFIG_MEMCG_KMEM + if (unlikely(!ac)) { + int nodeid = page_to_nid(virt_to_page(objp)); + + spin_lock(cachep-node[nodeid]-list_lock); + free_block(cachep, objp, 1, nodeid); + spin_unlock(cachep-node[nodeid]-list_lock); + return; + } +#endif + /* * Skip calling cache_free_alien() when the platform is not numa. * This will avoid cache misses that happen while accessing slabp (which @@ -3803,6 +3826,9 @@ static int __do_tune_cpucache(struct kmem_cache *cachep, int limit, struct ccupdate_struct *new; int i; + if (memcg_cache_dead(cachep)) + return 0; + new = kzalloc(sizeof(*new) + nr_cpu_ids * sizeof(struct array_cache *), gfp); if (!new) @@ -3988,6 +4014,9 @@ static void cache_reap(struct work_struct *w) list_for_each_entry(searchp, slab_caches, list) { check_irq_on(); + if (memcg_cache_dead(searchp)) + continue; + /* * We only take the node lock if absolutely necessary and we * have established with reasonable certainty that -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm v3 8/8] slab: do not keep free objects/slabs on dead memcg caches
On Fri, Jun 13, 2014 at 12:38:22AM +0400, Vladimir Davydov wrote: Since a dead memcg cache is destroyed only after the last slab allocated to it is freed, we must disable caching of free objects/slabs for such caches, otherwise they will be hanging around forever. For SLAB that means we must disable per cpu free object arrays and make free_block always discard empty slabs irrespective of node's free_limit. An alternative to this could be making cache_reap, which drains per cpu arrays and drops free slabs periodically for all caches, shrink dead caches aggressively. The patch doing this is attached. This approach has its pros and cons comparing to disabling per cpu arrays. Pros: - Less intrusive: it only requires modification of cache_reap. - Doesn't impact performance: free path isn't touched. Cons: - Delays dead cache destruction: lag between the last object is freed and the cache is destroyed isn't constant. It depends on the number of kmem-active memcgs and the number of dead caches (the more of them, the longer it'll take to shrink dead caches). Also, on NUMA machines the upper bound will be proportional to the number of NUMA nodes, because alien caches are reaped one at a time (see reap_alien). - If there are a lot of dead caches, periodic shrinking will be slowed down even for active caches (see cache_reap). -- diff --git a/mm/slab.c b/mm/slab.c index 9ca3b87edabc..811fdb214b9e 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -3980,6 +3980,11 @@ static void cache_reap(struct work_struct *w) goto out; list_for_each_entry(searchp, slab_caches, list) { + int force = 0; + + if (memcg_cache_dead(searchp)) + force = 1; + check_irq_on(); /* @@ -3991,7 +3996,7 @@ static void cache_reap(struct work_struct *w) reap_alien(searchp, n); - drain_array(searchp, n, cpu_cache_get(searchp), 0, node); + drain_array(searchp, n, cpu_cache_get(searchp), force, node); /* * These are racy checks but it does not matter @@ -4002,15 +4007,17 @@ static void cache_reap(struct work_struct *w) n-next_reap = jiffies + REAPTIMEOUT_NODE; - drain_array(searchp, n, n-shared, 0, node); + drain_array(searchp, n, n-shared, force, node); if (n-free_touched) n-free_touched = 0; else { - int freed; + int freed, tofree; + + tofree = force ? slabs_tofree(searchp, n) : + DIV_ROUND_UP(n-free_limit, 5 * searchp-num); - freed = drain_freelist(searchp, n, (n-free_limit + - 5 * searchp-num - 1) / (5 * searchp-num)); + freed = drain_freelist(searchp, n, tofree); STATS_ADD_REAPED(searchp, freed); } next: -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/