I think Websphere is a bit of a red herring. The trigger is probably any
process that asks for a lot more memory than the system has, for some value of
a lot.
The size of the array in the Perl script depends on how much memory your system
has. Fifty million works well with the 1 GB test server I've been
experimenting with. That's very close to filling all of swap. When we run the
Perl script there we don't get the date output---my guess is because it's so
close to the edge on memory that forking a shell and then running date fails
for lack of space.
I've tried downgrading the kernel from the current 3.0.80-0.5-default to the
original 3.0.82-0.7-default. That seems to make things worse, not better;
kswapd0 starts to go crazy as soon as the memory allocation needs to go into
swap space. It took almost two hours to allocate the array and then about
forty seconds each pass to step through it. On a guest with sufficient memory
the whole script takes about forty seconds for each pass with no long delay
while kswapd spins. (I altered this slightly so it doesn't fork off a call to
date.)
tc-ossm-test2-tedrb> perl -e 'print "Starting at ", scalar(localtime()),
"\n"; $num=52_000_000; @a=(0..$num); foreach $j (0..4) { $a[$_] = $_
foreach (0..$num); $|=1; print "Done with pass $j at", scalar(localtime()),
"\n" ; } print "Done\n"'
Starting at Mon Nov 4 17:23:24 2013
Done with pass 0 atMon Nov 4 19:14:10 2013
Done with pass 1 atMon Nov 4 19:14:45 2013
Done with pass 2 atMon Nov 4 19:15:25 2013
Done with pass 3 atMon Nov 4 19:16:02 2013
Done with pass 4 atMon Nov 4 19:16:39 2013
Done
52.472u 93.386s 1:53:30.26 2.1% 0+0k 406292304+0io 1331567pf+0w
(We also tried the earlier suggestion of setting MALLOC_ARENA_MAX to 1. That
doesn't seem to affect this at all.)
In a way, it's remarkable that this little script ever finishes. We would
never be able to go so far into swap space if we were swapping to physical disk
on a PC.
Thanks for all the suggestions!
Ted Rodriguez-Bell
Enterprise Virtualization, z/VM and z/Linux, Wells Fargo
(415) 222-4516 office (415) 516-7913 cell
45 Fremont Street 11th floor, MAC A0194-112, San Francisco, CA 94105
[email protected] or http://www.vtext.com text paging (but cell is safer)
Company policy requires: This message may contain confidential and/or
privileged information. If you are not the addressee or authorized to receive
this for the addressee, you must not use, copy, disclose, or take any action
based on this message or any information herein. If you have received this
message in error, please advise the sender immediately by reply e-mail and
delete this message. Thank you for your cooperation.
-----Original Message-----
From: Mark Post [mailto:[email protected]]
Sent: Monday, November 04, 2013 9:54 AM
Subject: Re: SLES 11 SP3 problems
>>> On 11/1/2013 at 03:54 PM, Marcy Cortes <[email protected]>
>>> wrote:
> This may trigger it on a 1G server
> perl -e '$num=50_000_000; @a=(0..$num); foreach $j (0..4) { $a[$_] = $_
> foreach (0..$num); $|=1; print "Done with pass $j at", `date` ; } print
> "Done\n"'
>
> (thanks to my colleague Ted for figuring that out).
I'm not able to replicate this problem on a plain SLES11 SP3 system with the
kernel updated to kernel-default-3.0.93-0.8.2. If I run the above perl script,
the only thing that happens is an "Out of memory!" message.
If I drop the number from 50_000_000 to 15_000_000, it will run to completion.
I don't see a lot of %sys time in top, however. I see spikes in %user (as
expected) and %wait. The %wait seems to be due to all the paging I/O being
done. If I turn off the paging device, the oom-killer gets called and kills
the script.
So, from my perspective, on my test system, things are working as they should,
but I don't have WAS installed or being deployed, so the parallels are nowhere
near as close as I would like. Hopefully the NTS guys will be able to figure
out what's going on with your systems.
Mark Post
----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
----------------------------------------------------------------------
For more information on Linux on System z, visit
http://wiki.linuxvm.org/
----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
----------------------------------------------------------------------
For more information on Linux on System z, visit
http://wiki.linuxvm.org/