Re: Linux OOM

Tom Duerbusch Sat, 24 Feb 2007 09:36:59 -0800

Previously, you said that OOM killed processes in a semi random manor.
That is my experience too.


That rexx exec I posted earlier.  I used as a Regina rexx exec under
Linux.  Just to see what would happen.  Yep, OOM never killed the memory
hog.  But it did kill about everything else.  This was under SLES7
during my early testing.  Looked to me that it was killing everything
else, trying to make room for that Memory Hog process.  Not the way I
wanted things done.  Eventually it did kill my rexx process.  But now
the machine was virtually useless.  So I've always keep swap large
enough that OOM doesn't get evoked, that is unless I'm the one causing
the weird strange behavior. <G>

Not everyone on this listserv are qualified VM System Programmers.
There are many on the VM listserv that are relatively new also.  One of
the original comments was that their VM Systems Programmer didn't allow
(or took back) the vdisks from the Linux machine(s).    My tangent on
all of this, was if my system was memory constrained, vdisks can cause
more memory problems (VM paging).  Swap to disk.  Sure it is slower.
Isn't that what Linux in LPAR do?  In a memory constrained VM system,
I'll easily trade more disk I/O for a smaller working set size or
reduced paging.  Then, go to management to make a case for more memory.
But, a z9 cost of memorry is $8K/GB (list price) in increment of 8 GB
(i.e. $64K list).  That isn't something your going to get this week.

Of course, when I say I will trade more I/O, I'm assumming there is
more I/O to trade.  If the dasd subsystem is already overloaded, and
your paging, perhaps the extra workload being put on, should wait.  Of
course, it never waits.  Once management makes a decision, logically,
they want it yesterday.

Tom Duerbusch
THD Consulting


The action gets written to the system log.  It would be possible to
have something looking through the log for those events.  On a virtual
memory exhausted system, that might be counter productive.  If you
transmit your syslog events to a separate logging system, that wouldn't
be an issue.

> Like I said before, when I've found OOM taking place, it is time to
> cycle the machine.  It is much easier and quicker to cycle than to
> correct the problem and restart all cancelled tasks.  Have you found
any
> different?  I hate to think that as OOM was cancelling things, that
it
> was producing a script that can be used to restart all cancelled
tasks
> (or something to that effect).

The whole point of the OOM killer is to try to kill the "bad guy" while
leaving the rest of the system up and running.  If the bad guy was your
main application (perhaps it has a big memory leak for example), then
recycling the system may be the best thing to do.  If your mission
critical application is still up and running fine, you may want to
consider restarting the killed processes.  Or, you might want to leave
them dead if they're just going to cause problems again.  No general
case can be made either way.


Mark Post

----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390
or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390

----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390

Re: Linux OOM

Reply via email to