On Tue, 03 Jul 2018 18:09:26 +0300 Kirill Tkhai <ktk...@virtuozzo.com> wrote:

> Imagine a big node with many cpus, memory cgroups and containers.
> Let we have 200 containers, every container has 10 mounts,
> and 10 cgroups. All container tasks don't touch foreign
> containers mounts. If there is intensive pages write,
> and global reclaim happens, a writing task has to iterate
> over all memcgs to shrink slab, before it's able to go
> to shrink_page_list().
> 
> Iteration over all the memcg slabs is very expensive:
> the task has to visit 200 * 10 = 2000 shrinkers
> for every memcg, and since there are 2000 memcgs,
> the total calls are 2000 * 2000 = 4000000.
> 
> So, the shrinker makes 4 million do_shrink_slab() calls
> just to try to isolate SWAP_CLUSTER_MAX pages in one
> of the actively writing memcg via shrink_page_list().
> I've observed a node spending almost 100% in kernel,
> making useless iteration over already shrinked slab.
> 
> This patch adds bitmap of memcg-aware shrinkers to memcg.
> The size of the bitmap depends on bitmap_nr_ids, and during
> memcg life it's maintained to be enough to fit bitmap_nr_ids
> shrinkers. Every bit in the map is related to corresponding
> shrinker id.
> 
> Next patches will maintain set bit only for really charged
> memcg. This will allow shrink_slab() to increase its
> performance in significant way. See the last patch for
> the numbers.
> 
> ...
>
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -182,6 +182,11 @@ static int prealloc_memcg_shrinker(struct shrinker 
> *shrinker)
>       if (id < 0)
>               goto unlock;
>  
> +     if (memcg_expand_shrinker_maps(id)) {
> +             idr_remove(&shrinker_idr, id);
> +             goto unlock;
> +     }
> +
>       if (id >= shrinker_nr_max)
>               shrinker_nr_max = id + 1;
>       shrinker->id = id;

This function ends up being a rather sad little thing.

: static int prealloc_memcg_shrinker(struct shrinker *shrinker)
: {
:       int id, ret = -ENOMEM;
: 
:       down_write(&shrinker_rwsem);
:       id = idr_alloc(&shrinker_idr, shrinker, 0, 0, GFP_KERNEL);
:       if (id < 0)
:               goto unlock;
: 
:       if (memcg_expand_shrinker_maps(id)) {
:               idr_remove(&shrinker_idr, id);
:               goto unlock;
:       }
: 
:       if (id >= shrinker_nr_max)
:               shrinker_nr_max = id + 1;
:       shrinker->id = id;
:       ret = 0;
: unlock:
:       up_write(&shrinker_rwsem);
:       return ret;
: }

- there's no need to call memcg_expand_shrinker_maps() unless id >=
  shrinker_nr_max so why not move the code and avoid calling
  memcg_expand_shrinker_maps() in most cases.

- why aren't we decreasing shrinker_nr_max in
  unregister_memcg_shrinker()?  That's easy to do, avoids pointless
  work in shrink_slab_memcg() and avoids memory waste in future
  prealloc_memcg_shrinker() calls.

  It should be possible to find the highest ID in an IDR tree with a
  straightforward descent of the underlying radix tree, but I doubt if
  that has been wired up.  Otherwise a simple loop in
  unregister_memcg_shrinker() would be needed.


Reply via email to