On Mon, Feb 08, 2016 at 01:23:53AM -0500, Johannes Weiner wrote:
> On Sun, Feb 07, 2016 at 08:27:35PM +0300, Vladimir Davydov wrote:
> > Workingset code was recently made memcg aware, but shadow node shrinker
> > is still global. As a result, one small cgroup can consume all memory
> > available for shadow nodes, possibly hurting other cgroups by reclaiming
> > their shadow nodes, even though reclaim distances stored in its shadow
> > nodes have no effect. To avoid this, we need to make shadow node
> > shrinker memcg aware.
> > 
> > Signed-off-by: Vladimir Davydov <[email protected]>
> 
> This patch is straight forward, but there is one tiny thing that bugs
> me about it, and that is switching from available memory to the size
> of the active list. Because the active list can shrink drastically at
> runtime.

Yeah, active file lru is a volatile thing indeed. Not only can it shrink
rapidly, it can also grow in an instant (e.g. due to mark_page_accessed)
so you're right - sizing shadow node lru basing solely on the active lru
size would be too unpredictable.

> 
> It's true that both the shrinking of the active list and subsequent
> activations to regrow it will reduce the number of actionable
> refaults, and so it wouldn't be unreasonable to also shrink shadow
> nodes when the active list shrinks.
> 
> However, I think these are too many assumptions to encode in the
> shrinker, because it is only meant to prevent a worst-case explosion
> of radix tree nodes. I'd prefer it to be dumb and conservative.
> 
> Could we instead go with the current usage of the memcg? Whether
> reclaim happens globally or due to the memory limit, the usage at the
> time of reclaim gives a good idea of the memory is available to the
> group. But it's making less assumptions about the internal composition
> of the memcg's memory, and the consequences associated with that.

But that would likely result in wasting a considerable chunk of memory
for stale shadow nodes in case file caches constitute only a small part
of memcg memory consumption, which isn't good IMHO.

May be, we'd better use LRU_ALL_FILE / 2 instead?

diff --git a/mm/workingset.c b/mm/workingset.c
index 8c07cd8af15e..8a75f8d2916a 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -351,9 +351,10 @@ static unsigned long count_shadow_nodes(struct shrinker 
*shrinker,
 
        if (memcg_kmem_enabled())
                pages = mem_cgroup_node_nr_lru_pages(sc->memcg, sc->nid,
-                                                    BIT(LRU_ACTIVE_FILE));
+                                                    LRU_ALL_FILE);
        else
-               pages = node_page_state(sc->nid, NR_ACTIVE_FILE);
+               pages = node_page_state(sc->nid, NR_ACTIVE_FILE) +
+                       node_page_state(sc->nid, NR_INACTIVE_FILE);
 
        /*
         * Active cache pages are limited to 50% of memory, and shadow
@@ -369,7 +370,7 @@ static unsigned long count_shadow_nodes(struct shrinker 
*shrinker,
         *
         * PAGE_SIZE / radix_tree_nodes / node_entries / PAGE_SIZE
         */
-       max_nodes = pages >> (RADIX_TREE_MAP_SHIFT - 3);
+       max_nodes = pages >> (1 + RADIX_TREE_MAP_SHIFT - 3);
 
        if (shadow_nodes <= max_nodes)
                return 0;

Reply via email to