On 02/09/2015 03:43 AM, Remy Dernat wrote:
Le 09/02/2015 03:56, Christopher Samuel a écrit :
On 07/02/15 14:57, Alan Louis Scheinine wrote:
Only problem I've seen is that if a user allocates too much memory,
OOM killer can kill maintenance processes such as a scheduler daemon.
This is why we disable overcommit. :-)
Hi,
I already saw that problem on our master. The scheduler, SGE, runs out
of memory and OOM decided to kill it:
Dec 1 15:01:07 cluster1 kernel: Out of memory: Kill process 7963
(sge_qmaster) score 948 or sacrifice child
I resolved that issue by disabling "schedd_job_info" in SGE with
"qconf -msconf".
However, this setting gives significant informations about our jobs.
How should I adjust OOM killer ? Sould I set
|vm.overcomm!
it_memory
= 2
|
?
To be clear setting vm.overcommit_memory doesn't directly affect the
behavior of the OOM killer. Turning off overcommit prevents the Linux
virtual memory system from making promises it can't always keep, which
reduces/eliminates the need for the OOM Killer.
Setting vm.overcommit_memory = 2 turns off overcommitting and is the
best choice if you want to avoid the OOM Killer.
--
Prentice
_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf