How the Linux OOM killer works

Most admins have probably experienced failures due to applications leaking memory, or worse yet consuming all of the virtual memory (physical memory + swap) on a host. The Linux kernel has an interesting way of dealing with memory exhaustion, and it comes in the way of the Linux OOM killer. When invoked, the OOM killer will begin terminating processes in order to free up enough memory to keep the system operational. I was curious how the OOM worked, so I decided to spend some time reading through the linux/mm/oom_kill.c Linux kernel source code file to see what the OOM killer does.

The OOM killer uses a point system to pick which processes to execute. The points are assigned by the badness() function, which contains the following block comment:

/**
 * badness - calculate a numeric value for how bad this task has been
 * @p: task struct of which task we should calculate
 * @uptime: current uptime in seconds
 *
 * The formula used is relatively simple and documented inline in the
 * function. The main rationale is that we want to select a good task
 * to kill when we run out of memory.
 *
 * Good in this context means that:
 * 1) we lose the minimum amount of work done
 * 2) we recover a large amount of memory
 * 3) we don't kill anything innocent of eating tons of memory
 * 4) we want to kill the minimum amount of processes (one)
 * 5) we try to kill the process the user expects us to kill, this
 *    algorithm has been meticulously tuned to meet the principle
 *    of least surprise ... (be careful when you change it)
 */

The actual code in this function does the following:

- Processes that have the PF_SWAPOFF flag set will be killed first

- Processes which fork a lot of child processes are next in line

- Kill off niced processes, since they are typically less important

- Superuser processes are usually more important, so try to avoid killing those

The code also takes takes into account the length of time the process has been running, which may or may not be a good thing. It’s interesting to see how technologies we take for granted actually work, and this experience really helped me understand what all the fields in the task_struct structure are used for. Now to dig into mm_struct. :)

matty on September 30, 2009 | Filed Under Linux Kernel

7 Comments

Robert Milkowski on October 1st, 2009

OOM killer is in Linux mostly do to workaround problems with memory overcommiting in Linux. Linux is slowly moving into direction of getting rid of memory overcommiting approach (by tweaking /proc you can disable it to some extend for some time now). The truth is that memory overcommiting + OOM killer is a bad thing – killing semi-randomly applications because system allowed them to allocate more virtual memory than it has is just plain stupid in most environments. But as I said – Linux is slowly catching up and getting rid of that unpleasant feature.

btw: in most other Unixes like Solaris OOM killer is not needed as system generally won’t allow for memory overcommitment.

implicate_order on October 5th, 2009

We’ve run into the OOM Killer running some critical billing apps in our environment (the Vendor won’t support anything besides RHEL4 on HP DL580s). Don’t ask me why…we’ve beaten ourselves silly over this.

In any case, their application running single-threaded processes bound to specific CPUs and has extensive memory leaks. As a result, OOM kills processes semi-randomly and has caused significant damage by panicking the system(s).

Woo on October 7th, 2009

I’m really surprised that Linux contains such a weird feature. I wonder what state of mind a coder must be to think of a feature that more or less randomly kills processes when the only sane reaction would be an ENOMEM to the offending malloc and letting the offending application handle the error itself.
This really sounds like a basis for entertaining debug sessions… especially as root-owned processes seem to only be avoided instead of fully exempt. How long until this feature decides to kill an important daemon like nfsd or other critical infrastructure processes which sends the whole box to go boom?

Robert Milkowski on October 8th, 2009

Woo – the problem with Linux is that by default it overcommits memory – basically when you do a malloc on linux it doesn’t reserve ane swap areay (memory + swap disk) and alway returns as successful. Then if you have couple of programs and all of them actually do want to use the memory the problem starts as system is running out of memory but there is basically no interface to tell it to applications as it already told all of them that there is enough of it… so it starts killing application in order to avoid complete lock. In Solaris everytime malloc() some memory system will reserve required space by default so you won’t end-up in such a situation.

Julien Gabel on October 19th, 2009

> How long until this feature decides to kill an important daemon like nfsd or other critical infrastructure processes which sends the whole box to go boom?

Not too long in my case:
http://blog.thilelli.net/post/2006/12/09/Memory-Behaviour%3A-Tuning-Linuxs-Kernel-Overcommit

jowblow on April 8th, 2010

Aix does the same thing.

chm0dvii on April 13th, 2010

Aix does the same thing, just uses a different program to do this.

[lk] How the Linux OOM killer works