I think Websphere is a bit of a red herring.  The trigger is probably any 
process that asks for a lot more memory than the system has, for some value of 
a lot.

The size of the array in the Perl script depends on how much memory your system 
has.  Fifty million works well with the 1 GB test server I've been 
experimenting with.  That's very close to filling all of swap.  When we run the 
Perl script there we don't get the date output---my guess is because it's so 
close to the edge on memory that forking a shell and then running date fails 
for lack of space.

I've tried downgrading the kernel from the current 3.0.80-0.5-default to the 
original 3.0.82-0.7-default.  That seems to make things worse, not better; 
kswapd0 starts to go crazy as soon as the memory allocation needs to go into 
swap space.  It took almost two hours to allocate the array and then about 
forty seconds each pass to step through it.  On a guest with sufficient memory 
the whole script takes about forty seconds for each pass with no long delay 
while kswapd spins.  (I altered this slightly so it doesn't fork off a call to 
date.)
  tc-ossm-test2-tedrb> perl -e 'print "Starting at ", scalar(localtime()), 
"\n"; $num=52_000_000;      @a=(0..$num); foreach $j (0..4) { $a[$_] = $_ 
foreach (0..$num); $|=1; print "Done with pass $j at",   scalar(localtime()), 
"\n" ; }  print "Done\n"'
Starting at Mon Nov  4 17:23:24 2013
Done with pass 0 atMon Nov  4 19:14:10 2013
Done with pass 1 atMon Nov  4 19:14:45 2013
Done with pass 2 atMon Nov  4 19:15:25 2013
Done with pass 3 atMon Nov  4 19:16:02 2013
Done with pass 4 atMon Nov  4 19:16:39 2013
Done
52.472u 93.386s 1:53:30.26 2.1% 0+0k 406292304+0io 1331567pf+0w

(We also tried the earlier suggestion of setting MALLOC_ARENA_MAX to 1.  That 
doesn't seem to affect this at all.)

In a way, it's remarkable that this little script ever finishes.  We would 
never be able to go so far into swap space if we were swapping to physical disk 
on a PC.

Thanks for all the suggestions!

Ted Rodriguez-Bell
Enterprise Virtualization, z/VM and z/Linux, Wells Fargo
(415) 222-4516 office   (415) 516-7913 cell 
45 Fremont Street 11th floor, MAC A0194-112, San Francisco, CA 94105 
[email protected] or http://www.vtext.com text paging (but cell is safer)
Company policy requires:  This message may contain confidential and/or 
privileged information.  If you are not the addressee or authorized to receive 
this for the addressee, you must not use, copy, disclose, or take any action 
based on this message or any information herein.  If you have received this 
message in error, please advise the sender immediately by reply e-mail and 
delete this message.  Thank you for your cooperation.
-----Original Message-----
From: Mark Post [mailto:[email protected]] 
Sent: Monday, November 04, 2013 9:54 AM
Subject: Re: SLES 11 SP3 problems

>>> On 11/1/2013 at 03:54 PM, Marcy Cortes <[email protected]> 
>>> wrote: 
> This may trigger it on a 1G server
> perl -e '$num=50_000_000; @a=(0..$num); foreach $j (0..4) { $a[$_] = $_ 
> foreach (0..$num); $|=1; print "Done with pass $j at", `date` ; }  print 
> "Done\n"'
> 
> (thanks to my colleague Ted for figuring that out).

I'm not able to replicate this problem on a plain SLES11 SP3 system with the 
kernel updated to kernel-default-3.0.93-0.8.2.  If I run the above perl script, 
the only thing that happens is an "Out of memory!" message.

If I drop the number from 50_000_000 to 15_000_000, it will run to completion.  
I don't see a lot of %sys time in top, however.  I see spikes in %user (as 
expected) and %wait.  The %wait seems to be due to all the paging I/O being 
done.  If I turn off the paging device, the oom-killer gets called and kills 
the script.

So, from my perspective, on my test system, things are working as they should, 
but I don't have WAS installed or being deployed, so the parallels are nowhere 
near as close as I would like.  Hopefully the NTS guys will be able to figure 
out what's going on with your systems.


Mark Post

----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
----------------------------------------------------------------------
For more information on Linux on System z, visit
http://wiki.linuxvm.org/

----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
----------------------------------------------------------------------
For more information on Linux on System z, visit
http://wiki.linuxvm.org/

Reply via email to