We should take into account memory.oom_guarantee even for unlimited cgroups in order to handle system slices and VM cgroups properly. Since we have no idea of what VM's limit is actually equal to, let's rework the formula behind the worst cgroup selection. Instead of using
(usage - guarantee) / (limit - guarantee) let's just take usage / guarantee This looks reasonable - the more cgroup exceeds its guarantee the sooner it will be selected by OOM killer. Note, after this change containers w/o guarantee configured will be primary OOM victims, but that looks fair enough. Among such containers, the one with the greatest memory consumption value will be selected, which is also fine. https://jira.sw.ru/browse/PSBM-44683 Signed-off-by: Vladimir Davydov <[email protected]> --- mm/memcontrol.c | 14 +++++--------- 1 file changed, 5 insertions(+), 9 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 6009ff5d1903..af39f25df67f 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1702,18 +1702,14 @@ struct oom_context *mem_cgroup_oom_context(struct mem_cgroup *memcg) unsigned long mem_cgroup_overdraft(struct mem_cgroup *memcg) { - unsigned long long guarantee, limit, usage; - unsigned long score; + unsigned long long guarantee, usage; - guarantee = ACCESS_ONCE(memcg->oom_guarantee); - limit = res_counter_read_u64(&memcg->memsw, RES_LIMIT); - usage = res_counter_read_u64(&memcg->memsw, RES_USAGE); - - if (limit >= RESOURCE_MAX || guarantee >= limit || usage <= guarantee) + if (mem_cgroup_is_root(memcg)) return 0; - score = div64_u64(1000 * (usage - guarantee), limit - guarantee); - return score > 0 ? score : 1; + guarantee = ACCESS_ONCE(memcg->oom_guarantee); + usage = res_counter_read_u64(&memcg->memsw, RES_USAGE); + return div64_u64(1000 * usage, guarantee + 1); } unsigned long mem_cgroup_total_pages(struct mem_cgroup *memcg, bool swap) -- 2.1.4 _______________________________________________ Devel mailing list [email protected] https://lists.openvz.org/mailman/listinfo/devel
