Hi All,

We have an issue where the Linux kills off GPFS first when a computer runs out 
of memory, this happens when user processors have exhausted memory and swap and 
the out of memory killer in Linux kills the GPFS daemon as the largest user of 
memory, due to its large pinned memory foot print.

We have an issue where the Linux kills off GPFS first when a computer runs out 
of memory. We are running GPFS 3.5

We believe this happens when user processes have exhausted memory and swap and 
the out of memory killer in Linux chooses to  kill the GPFS daemon as the 
largest user of memory, due to its large pinned memory footprint.

This means that GPFS is killed and the whole cluster blocks for a minute before 
it resumes operation, this is not ideal, and kills and causes issues with most 
of the cluster.

What we see is users unable to login elsewhere on the cluster until we have 
powered off the node. We believe this is because while the node is still 
pingable, GPFS doesn't expel it from the cluster.

This issue mainly occurs on our frontend nodes of our HPC cluster but can 
effect the rest of the cluster when it occurs.

This issue mainly occurs on the login nodes of our HPC cluster but can affect 
the rest of the cluster when it occurs.

I've seen others on list with this issue.

We've come up with a solution where by the gpfs is adjusted so that is unlikely 
to be the first thing to be killed, and hopefully the user process is killed 
and not GPFS.

We've come up with a solution to adjust the OOM score of GPFS, so that it is 
unlikely to be the first thing to be killed, and hopefully the OOM killer picks 
a user process instead.

Out testing says this solution works, but I'm asking here firstly to share our 
knowledge and secondly to ask if there is anything we've missed with this 
solution and issues with this.

We've tested this and it seems to work. I'm asking here firstly to share our 
knowledge and secondly to ask if there is anything we've missed with this 
solution.

Its short which is part of its beauty.

/usr/local/sbin/gpfs-oom_score_adj

<pre>
#!/bin/bash

 for proc in $(pgrep mmfs); do
      echo -500 >/proc/$proc/oom_score_adj
 done
</pre>

This can then be called automatically on GPFS startup with the following:

<pre>
mmaddcallback startupoomkiller --command /usr/local/sbin/gpfs-oom_score_adj 
--event startup
</pre>

and either restart gpfs or just run the script on all nodes.

Peter Childs
ITS Research Infrastructure
Queen Mary, University of London
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Reply via email to