Hi Daniel,

I have seen the __alloc_pages message in /var/log/messages as well as
the VM: Killing process message.

One of the guests is 1280Mb, the other 1792. They were initially sized
at 1024 and 1536 to accomodate the number of websphere JVM's that are
deployed.

I was using sar to see what the memory usage was, and prior to the
guest memory change both guests showed 99% memory used or close to
that.

On the last test I was doing a sar -r 1 10000 to see the sar stats in
real time and the guest that hung was showing 99% memory used and 100%
swap used as the last entry. This was the guest with 1280Mb.

When I do a sar -r to show all of todays memory stats I see gaps in
the samples, these may well correspond to the test period maybe
indicating the system was having trouble at that time...

06:50:01       288612   1003072     77.66         0     62856
122192     19188         0      0.00
07:00:00       287860   1003824     77.71         0     62972
122196     19188         0      0.00
07:10:00       288560   1003124     77.66         0     63108
122200     19188         0      0.00
07:20:00       288288   1003396     77.68         0     63140
122208     19188         0      0.00
08:40:00      1106488    185196     14.34         0     10748
66932     19188         0      0.00
08:50:00      1044940    246744     19.10         0     13156
85024     19188         0      0.00
09:00:01      1042608    249076     19.28         0     14228
85028     19188         0      0.00
09:10:00      1042092    249592     19.32         0     15284
85028     19188         0      0.00
09:20:00      1040920    250764     19.41         0     16344
85036     19188         0      0.00
09:30:00      1033536    258148     19.99         0     17640
89132     19188         0      0.00
09:40:00       791072    500612     38.76         0     20000
97124     19188         0      0.00
09:50:00       574448    717236     55.53         0     22128
116556     19188         0      0.00
10:00:00       420700    870984     67.43         0     24028
116748     19188         0      0.00
10:10:00       361164    930520     72.04         0     24996
116828     19188         0      0.00
10:20:00       356560    935124     72.40         0     26160
116852     19188         0      0.00
10:30:00       349332    942352     72.96         0     27944
116892     19188         0      0.00
10:40:00       347436    944248     73.10         0     29692
116924     19188         0      0.00
10:50:01       342628    949056     73.47         0     31324
116948     19188         0      0.00
11:00:00       339592    952092     73.71         0     32608
116952     19188         0      0.00
11:10:00       337632    954052     73.86         0     33860
116992     19188         0      0.00
11:20:00       332472    959212     74.26         0     35124
116996     19188         0      0.00
11:30:00       330344    961340     74.43         0     36564
117016     19188         0      0.00
12:00:00      1043196    248488     19.24         0     12780
84752     19188         0      0.00
12:10:00       659188    632496     48.97         0     15280
106392     19188         0      0.00
12:20:00       659508    632176     48.94         0     16500
106648     19188         0      0.00
12:30:01       656844    634840     49.15         0     17424
107164     19188         0      0.00
12:40:00       617020    674664     52.23         0     18696
113900     19188         0      0.00
12:50:00       548940    742744     57.50         0     19932
167984     19188         0      0.00
13:00:00       546816    744868     57.67         0     20864
167988     19188         0      0.00
Average:       416444    875240     67.76         0     46987
117352     19188         0      0.00

Both guests have a 19188k swap disk (VDISK) defined.

I am using grinder to put the guests under load, but this is no where
near the sort of load I want to achieve. The grinder test runs for
approx. 2-3 hours before a guest will hang.

If you log on to a hung guest using z/VM you get the reconnected
message, but no other response.

Troy.

On Wed, 15 Sep 2004 07:54:24 -0400, Daniel Jarboe
<[EMAIL PROTECTED]> wrote:
> If you watch the console long enough, do you see anything like:
>
> __alloc_pages: 0-order allocation failed
>
> or
>
> VM: killing process?
>
> How large did you define the guest?  How much is a "small amount" of
> swap?  What kind of workload?  What error message do you get when you
> try to log onto the linux console?
>
> Also, by default SLES8 kicks off sa every 10 minutes.  If you're lucky
> you might get one of these "hangs" shortly after one of these 10 minute
> intervals.  It might be interesting to check out the system activity
> data.  Something like:
>
> /usr/bin/sar -u -r -f /var/log/sa/sa.2004_09_15
>
> You can further narrow it if you want by specifying a start and end
> time, like:
>
> /usr/bin/sar -s 08:00:00 -e 09:00:00 -u -r -f /var/log/sa/sa.2004_09_15
>
> ~ Daniel
>
>
>
>
> > -----Original Message-----
> > From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of
> > Troyski
> > Sent: Wednesday, September 15, 2004 6:58 AM
> > To: [EMAIL PROTECTED]
> > Subject: Re: Guest freeze
> >
> > Just to let you know; I removed the samba client RPM from two of my
> > guests that exhibited the hang problem, but one at least one still
> > shows the problem. I still think the problem is memory-related,
> > possibly a memory leak as the guest that hung had reached 99% of
> > memory and 100% of swap (abeit a very small amount of swap) just
> > before it hung.
> >
> > Troy.
> >
> >
> > On Tue, 14 Sep 2004 13:32:10 -0400, Joe Poole <[EMAIL PROTECTED]>
> wrote:
> > > We've seen it when a Samba smbd session goes into a loop when
> someone,
> > > probably trying to back up his C drive, or store the family jpegs,
> runs
> > > the server out of space.  The solution is to find the process ID
> with
> > > TOP and kill that session.
> > >
> > >
> > >
> > > On Tuesday 14 September 2004 13:20, you wrote:
> > > > Do you know what specifically triggers the problem? I was just
> asked
> > > > if we could do samba.....
> > > >
> > > >
> > > >
> > > >
> > > >              "Seader, Cameron"
> > > >              <[EMAIL PROTECTED]
> > > >              er.com>
> > > >   To Sent by: Linux on         [EMAIL PROTECTED] 390 Port
> > > >                                             cc <[EMAIL PROTECTED]
> > > >              IST.EDU>
> > > > Subject Re: Guest freeze
> > > >
> > > >              09/14/2004 08:33
> > > >              AM
> > > >
> > > >
> > > >              Please respond to
> > > >              Linux on 390 Port
> > > >              <[EMAIL PROTECTED]
> > > >                  IST.EDU>
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Here is some more information about what we had discovered after
> > > > talking with SuSE. This problem was never resolved.
> > > >
> > > > Just an update on the bug found yesterday. I just got off the
> phone
> > > > talking with Novell's SUSE support department and we were able to
> > > > narrow down which package was causeing the conflict with the
> > > > openldap2-client library. Samba-2.2.8a has a service called nmb
> which
> > > > is the netbios service, which has hooks in the openldap2 library
> > > > files. We were able to narrow down to the service nmb when it is
> > > > started it causes a loop process to start in the kernel process
> > > > ksoftirqd_CPU0. This investigation has been sent off to the
> > > > Development labs at SUSE in Germany. They will either find a fix
> for
> > > > the current package and send that out globally, send us a beta
> > > > version of SLES 9 (which is on beta 2 right now), or they will
> have
> > > > us wait until SLES 9 is released. Another option is to download
> the
> > > > Samba 3.0 binaries and compile it for our platform ourselves,
> which
> > > > would go out of the support bounds for SUSE.
> > > >
> > > > -Cameron Seader
> > > >
> > > > -----Original Message-----
> > > > From: Troyski [mailto:[EMAIL PROTECTED]
> > > > Sent: Tuesday, September 14, 2004 07:13
> > > > To: [EMAIL PROTECTED]
> > > > Subject: Guest freeze
> > > >
> > > >
> > > > Hi all,
> > > >
> > > > Anybody seen the following conditions :-
> > > >
> > > > o VM linux guest hangs. No response to ssh or at 3270 console.
> Pings
> > > > from other servers (external and internal) ok though (?)
> > > > o VM CPU @ 100%.
> > > > o No VM paging.
> > > > o VM guys say linux is "looping".
> > > >
> > > > zSeries 800/SLES8 SP2
> > > >
> > > > Would a linux guest memory issue cause this?
>
> ----------------------------------------------------------------------
> For LINUX-390 subscribe / signoff / archive access instructions,
> send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
> http://www.marist.edu/htbin/wlvindex?LINUX-390
>



--
                      ---oOo---
Troyski (Public Email)
.........................................
:: Web/PGP Key    :: www.troyski.co.uk ::
:: MSN [EMAIL PROTECTED] :: YIM troy_muller       ::
:: ICQ #31206542  :: AIM trymul          ::
:: GS500E XJ600S  :: LCDR Troyski    ::
.........................................
:: NCOS || CQC || SCI100 || SCI101     ::
:: NUR101 || SCC#60975                    ::
.........................................
\'He who dies with the most toys - wins.\'
<Unknown>
                      ---oOo---

----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390

Reply via email to