Since the 2.6 kernel, the oom killer has slightly biased away from 
CAP_SYS_ADMIN processes by discounting some of its memory usage in 
comparison to other processes.

This has always been implicit and nothing exactly relies on the behavior.

Gaurav notices that __task_cred() can dereference a potentially freed 
pointer if the task under consideration is exiting because a reference to 
the task_struct is not held.

Remove the CAP_SYS_ADMIN bias so that all processes are treated equally.

If any CAP_SYS_ADMIN process would like to be biased against, it is always 
allowed to adjust /proc/pid/oom_score_adj.

Reported-by: Gaurav Kohli <gko...@codeaurora.org>
Signed-off-by: David Rientjes <rient...@google.com>
---
 mm/oom_kill.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -224,13 +224,6 @@ unsigned long oom_badness(struct task_struct *p, struct 
mem_cgroup *memcg,
                mm_pgtables_bytes(p->mm) / PAGE_SIZE;
        task_unlock(p);
 
-       /*
-        * Root processes get 3% bonus, just like the __vm_enough_memory()
-        * implementation used by LSMs.
-        */
-       if (has_capability_noaudit(p, CAP_SYS_ADMIN))
-               points -= (points * 3) / 100;
-
        /* Normalize to oom_score_adj units */
        adj *= totalpages / 1000;
        points += adj;

Reply via email to