This version of the patch is for ReadyKernel patch only.

If the Node is highly loaded and we have a lot of mem cgroups to
scan/reclaim it's quite possible to race with some memcg death and
restart the reclaimer loop again and again causing kswapd to work
forever even if there are a lot of free memory already.

More precisely: kswapd may loop forever in shrink_zone() in
        do {} while ((memcg = mem_cgroup_iter(root, memcg, &reclaim)))

loop if mem_cgroup_iter() restarts from root memory cgroup again and
again.
And the latter could happen is memory cgroups are created/destroyed
faster than shrink_zone() reclaims all memory cgroups in a row.

Let's stop kswapd in case we see mem_cgroup_iter() has restarted several
times in shrink_zone(). We might prevent kswapd from making its work
good enough, but will definitely prevent the situation when permanent
kswapd work brings node to swapin/swapout while the node has a lot of
free RAM.

https://jira.sw.ru/browse/PSBM-123655

Signed-off-by: Konstantin Khorenko <[email protected]>
---
 mm/vmscan.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 81a115313582..7695bb1d4d72 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2657,6 +2657,7 @@ static void shrink_zone(struct zone *zone, struct 
scan_control *sc,
        unsigned long nr_reclaimed, nr_scanned;
        bool slab_only = sc->slab_only;
        bool retry;
+       int root_memcg_counter = 0;
 
        do {
                struct mem_cgroup *root = sc->target_mem_cgroup;
@@ -2724,6 +2725,20 @@ static void shrink_zone(struct zone *zone, struct 
scan_control *sc,
                                mem_cgroup_iter_break(root, memcg);
                                break;
                        }
+
+                       /*
+                        * We can loop in this cycle forever on global reclaim
+                        * if our Node had a lot of mem cgroups && disk is slow
+                        * && mem cgroups are created/destroyed often.
+                        * In this case mem_cgroup_iter() will restart the loop
+                        * from the root memcg again and again. Forever.
+                        */
+                       if (global_reclaim(sc) &&
+                           (memcg == root_mem_cgroup) &&
+                           (root_memcg_counter++ > 5)) {
+                               mem_cgroup_iter_break(root, memcg);
+                               break;
+                       }
                } while ((memcg = mem_cgroup_iter(root, memcg, &reclaim)));
 
                if ((!sc->has_inactive || !sc->nr_reclaimed)
-- 
2.24.3

_______________________________________________
Devel mailing list
[email protected]
https://lists.openvz.org/mailman/listinfo/devel

Reply via email to