Hi Daniel, I have seen the __alloc_pages message in /var/log/messages as well as the VM: Killing process message.
One of the guests is 1280Mb, the other 1792. They were initially sized at 1024 and 1536 to accomodate the number of websphere JVM's that are deployed. I was using sar to see what the memory usage was, and prior to the guest memory change both guests showed 99% memory used or close to that. On the last test I was doing a sar -r 1 10000 to see the sar stats in real time and the guest that hung was showing 99% memory used and 100% swap used as the last entry. This was the guest with 1280Mb. When I do a sar -r to show all of todays memory stats I see gaps in the samples, these may well correspond to the test period maybe indicating the system was having trouble at that time... 06:50:01 288612 1003072 77.66 0 62856 122192 19188 0 0.00 07:00:00 287860 1003824 77.71 0 62972 122196 19188 0 0.00 07:10:00 288560 1003124 77.66 0 63108 122200 19188 0 0.00 07:20:00 288288 1003396 77.68 0 63140 122208 19188 0 0.00 08:40:00 1106488 185196 14.34 0 10748 66932 19188 0 0.00 08:50:00 1044940 246744 19.10 0 13156 85024 19188 0 0.00 09:00:01 1042608 249076 19.28 0 14228 85028 19188 0 0.00 09:10:00 1042092 249592 19.32 0 15284 85028 19188 0 0.00 09:20:00 1040920 250764 19.41 0 16344 85036 19188 0 0.00 09:30:00 1033536 258148 19.99 0 17640 89132 19188 0 0.00 09:40:00 791072 500612 38.76 0 20000 97124 19188 0 0.00 09:50:00 574448 717236 55.53 0 22128 116556 19188 0 0.00 10:00:00 420700 870984 67.43 0 24028 116748 19188 0 0.00 10:10:00 361164 930520 72.04 0 24996 116828 19188 0 0.00 10:20:00 356560 935124 72.40 0 26160 116852 19188 0 0.00 10:30:00 349332 942352 72.96 0 27944 116892 19188 0 0.00 10:40:00 347436 944248 73.10 0 29692 116924 19188 0 0.00 10:50:01 342628 949056 73.47 0 31324 116948 19188 0 0.00 11:00:00 339592 952092 73.71 0 32608 116952 19188 0 0.00 11:10:00 337632 954052 73.86 0 33860 116992 19188 0 0.00 11:20:00 332472 959212 74.26 0 35124 116996 19188 0 0.00 11:30:00 330344 961340 74.43 0 36564 117016 19188 0 0.00 12:00:00 1043196 248488 19.24 0 12780 84752 19188 0 0.00 12:10:00 659188 632496 48.97 0 15280 106392 19188 0 0.00 12:20:00 659508 632176 48.94 0 16500 106648 19188 0 0.00 12:30:01 656844 634840 49.15 0 17424 107164 19188 0 0.00 12:40:00 617020 674664 52.23 0 18696 113900 19188 0 0.00 12:50:00 548940 742744 57.50 0 19932 167984 19188 0 0.00 13:00:00 546816 744868 57.67 0 20864 167988 19188 0 0.00 Average: 416444 875240 67.76 0 46987 117352 19188 0 0.00 Both guests have a 19188k swap disk (VDISK) defined. I am using grinder to put the guests under load, but this is no where near the sort of load I want to achieve. The grinder test runs for approx. 2-3 hours before a guest will hang. If you log on to a hung guest using z/VM you get the reconnected message, but no other response. Troy. On Wed, 15 Sep 2004 07:54:24 -0400, Daniel Jarboe <[EMAIL PROTECTED]> wrote: > If you watch the console long enough, do you see anything like: > > __alloc_pages: 0-order allocation failed > > or > > VM: killing process? > > How large did you define the guest? How much is a "small amount" of > swap? What kind of workload? What error message do you get when you > try to log onto the linux console? > > Also, by default SLES8 kicks off sa every 10 minutes. If you're lucky > you might get one of these "hangs" shortly after one of these 10 minute > intervals. It might be interesting to check out the system activity > data. Something like: > > /usr/bin/sar -u -r -f /var/log/sa/sa.2004_09_15 > > You can further narrow it if you want by specifying a start and end > time, like: > > /usr/bin/sar -s 08:00:00 -e 09:00:00 -u -r -f /var/log/sa/sa.2004_09_15 > > ~ Daniel > > > > > > -----Original Message----- > > From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of > > Troyski > > Sent: Wednesday, September 15, 2004 6:58 AM > > To: [EMAIL PROTECTED] > > Subject: Re: Guest freeze > > > > Just to let you know; I removed the samba client RPM from two of my > > guests that exhibited the hang problem, but one at least one still > > shows the problem. I still think the problem is memory-related, > > possibly a memory leak as the guest that hung had reached 99% of > > memory and 100% of swap (abeit a very small amount of swap) just > > before it hung. > > > > Troy. > > > > > > On Tue, 14 Sep 2004 13:32:10 -0400, Joe Poole <[EMAIL PROTECTED]> > wrote: > > > We've seen it when a Samba smbd session goes into a loop when > someone, > > > probably trying to back up his C drive, or store the family jpegs, > runs > > > the server out of space. The solution is to find the process ID > with > > > TOP and kill that session. > > > > > > > > > > > > On Tuesday 14 September 2004 13:20, you wrote: > > > > Do you know what specifically triggers the problem? I was just > asked > > > > if we could do samba..... > > > > > > > > > > > > > > > > > > > > "Seader, Cameron" > > > > <[EMAIL PROTECTED] > > > > er.com> > > > > To Sent by: Linux on [EMAIL PROTECTED] 390 Port > > > > cc <[EMAIL PROTECTED] > > > > IST.EDU> > > > > Subject Re: Guest freeze > > > > > > > > 09/14/2004 08:33 > > > > AM > > > > > > > > > > > > Please respond to > > > > Linux on 390 Port > > > > <[EMAIL PROTECTED] > > > > IST.EDU> > > > > > > > > > > > > > > > > > > > > > > > > > > > > Here is some more information about what we had discovered after > > > > talking with SuSE. This problem was never resolved. > > > > > > > > Just an update on the bug found yesterday. I just got off the > phone > > > > talking with Novell's SUSE support department and we were able to > > > > narrow down which package was causeing the conflict with the > > > > openldap2-client library. Samba-2.2.8a has a service called nmb > which > > > > is the netbios service, which has hooks in the openldap2 library > > > > files. We were able to narrow down to the service nmb when it is > > > > started it causes a loop process to start in the kernel process > > > > ksoftirqd_CPU0. This investigation has been sent off to the > > > > Development labs at SUSE in Germany. They will either find a fix > for > > > > the current package and send that out globally, send us a beta > > > > version of SLES 9 (which is on beta 2 right now), or they will > have > > > > us wait until SLES 9 is released. Another option is to download > the > > > > Samba 3.0 binaries and compile it for our platform ourselves, > which > > > > would go out of the support bounds for SUSE. > > > > > > > > -Cameron Seader > > > > > > > > -----Original Message----- > > > > From: Troyski [mailto:[EMAIL PROTECTED] > > > > Sent: Tuesday, September 14, 2004 07:13 > > > > To: [EMAIL PROTECTED] > > > > Subject: Guest freeze > > > > > > > > > > > > Hi all, > > > > > > > > Anybody seen the following conditions :- > > > > > > > > o VM linux guest hangs. No response to ssh or at 3270 console. > Pings > > > > from other servers (external and internal) ok though (?) > > > > o VM CPU @ 100%. > > > > o No VM paging. > > > > o VM guys say linux is "looping". > > > > > > > > zSeries 800/SLES8 SP2 > > > > > > > > Would a linux guest memory issue cause this? > > ---------------------------------------------------------------------- > For LINUX-390 subscribe / signoff / archive access instructions, > send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit > http://www.marist.edu/htbin/wlvindex?LINUX-390 > -- ---oOo--- Troyski (Public Email) ......................................... :: Web/PGP Key :: www.troyski.co.uk :: :: MSN [EMAIL PROTECTED] :: YIM troy_muller :: :: ICQ #31206542 :: AIM trymul :: :: GS500E XJ600S :: LCDR Troyski :: ......................................... :: NCOS || CQC || SCI100 || SCI101 :: :: NUR101 || SCC#60975 :: ......................................... \'He who dies with the most toys - wins.\' <Unknown> ---oOo--- ---------------------------------------------------------------------- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
