I think that SLES7 runs the 2.4 kernel. The old 2.4 kernel OOM killer has problems like the one you describe. The 2.6 kernel OOM killer is much better.
-----Original Message----- From: Linux on 390 Port [mailto:[EMAIL PROTECTED] Behalf Of Tom Duerbusch Sent: Saturday, February 24, 2007 9:36 AM To: [email protected] Subject: Re: Linux OOM Previously, you said that OOM killed processes in a semi random manor. That is my experience too. That rexx exec I posted earlier. I used as a Regina rexx exec under Linux. Just to see what would happen. Yep, OOM never killed the memory hog. But it did kill about everything else. This was under SLES7 during my early testing. Looked to me that it was killing everything else, trying to make room for that Memory Hog process. Not the way I wanted things done. Eventually it did kill my rexx process. But now the machine was virtually useless. So I've always keep swap large enough that OOM doesn't get evoked, that is unless I'm the one causing the weird strange behavior. <G> Not everyone on this listserv are qualified VM System Programmers. There are many on the VM listserv that are relatively new also. One of the original comments was that their VM Systems Programmer didn't allow (or took back) the vdisks from the Linux machine(s). My tangent on all of this, was if my system was memory constrained, vdisks can cause more memory problems (VM paging). Swap to disk. Sure it is slower. Isn't that what Linux in LPAR do? In a memory constrained VM system, I'll easily trade more disk I/O for a smaller working set size or reduced paging. Then, go to management to make a case for more memory. But, a z9 cost of memorry is $8K/GB (list price) in increment of 8 GB (i.e. $64K list). That isn't something your going to get this week. Of course, when I say I will trade more I/O, I'm assumming there is more I/O to trade. If the dasd subsystem is already overloaded, and your paging, perhaps the extra workload being put on, should wait. Of course, it never waits. Once management makes a decision, logically, they want it yesterday. Tom Duerbusch THD Consulting The action gets written to the system log. It would be possible to have something looking through the log for those events. On a virtual memory exhausted system, that might be counter productive. If you transmit your syslog events to a separate logging system, that wouldn't be an issue. > Like I said before, when I've found OOM taking place, it is time to > cycle the machine. It is much easier and quicker to cycle than to > correct the problem and restart all cancelled tasks. Have you found any > different? I hate to think that as OOM was cancelling things, that it > was producing a script that can be used to restart all cancelled tasks > (or something to that effect). The whole point of the OOM killer is to try to kill the "bad guy" while leaving the rest of the system up and running. If the bad guy was your main application (perhaps it has a big memory leak for example), then recycling the system may be the best thing to do. If your mission critical application is still up and running fine, you may want to consider restarting the killed processes. Or, you might want to leave them dead if they're just going to cause problems again. No general case can be made either way. Mark Post ---------------------------------------------------------------------- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 ---------------------------------------------------------------------- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 ---------------------------------------------------------------------- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
