Re: VM lockup due to storage typo

Tom Duerbusch Tue, 15 Sep 2009 10:12:39 -0700

Thinking about this a little futher....

How could 1 error cause this?


In the user direct, the user statement has:

USER LINUX27  xxxxxx  32M 600M G 

There are two memory related parms.  The one your guest machine is built with, 
in this case 32 MB.
The other is the maximum memory size for your guest, in this case 600 MB.

With either the initial size, or the dynamically defined size via a DEF STOR 
command, you can't exceed the maximum size.

So to define 8 TB of storage, you have to change the max size to be something 
very large.
And then define the machine to use that size.

So it seems to me that there are two mistakes.  

You told CP you might want a very large size, and when you finally asked for 
it, it obeyed.
That isn't a CP error.

The same problem occurs when you tell CP that you are ok with TB sized vdisks.  
And then you define one.
And then use it up <G>.

Of course, anything that can cause CP to crash isn't a good thing.
Perhaps we need a dedicated paging area for CP, i.e. something like the DUMP 
area for CP dumps, instead of using SPOL.  The guest machines are still going 
to crash, and the offending machine will be the last of many machines to bite 
the dust.  But, CP would survive.  It might be easier to IPL to get everything 
back running again.

Tom Duerbusch
THD Consulting


>>> "Schuh, Richard" <rsc...@visa.com> 9/15/2009 11:59 AM >>>
Maybe CP couldn't know that the guest would do something bad, but it should 
know that it has opened itself to the possibility that the guest could, in 
normal operation, cause the problem. 
One of Alan's first precepts of information security and integrity is that the 
guest cannot be allowed to harm the CP. This clearly violates that.

Regards, 
Richard Schuh 

 

> -----Original Message-----
> From: The IBM z/VM Operating System 
> [mailto:ib...@listserv.uark.edu] On Behalf Of Tom Duerbusch
> Sent: Tuesday, September 15, 2009 9:19 AM
> To: IBMVM@LISTSERV.UARK.EDU 
> Subject: Re: VM lockup due to storage typo
> 
> CP wouldn't know at IPL time, the guest would, not could, but 
> would cause such harm.
> 
> Just because you say you can use xxx GB, doesn't mean you 
> would actually use them.
> 
> When page fills, it over flows to spool.
> When spool fills, CP abends on the next pageout.
> 
> Tom Duerbusch
> THD Consulting
> 
> >>> Marcy Cortes <marcy.d.cor...@wellsfargo.com> 9/15/2009 
> 11:02 AM >>>
> See a thread on this list with subject "Sanity check?" from 
> Oct 2007 for what happened when I did the same thing ;)
> 
> You probably filled page space.
> 
> I still think IBM should refuse to IPL a guest that will 
> cause such harm.
> 
> 
> Marcy 
> 
> "This message may contain confidential and/or privileged 
> information. If you are not the addressee or authorized to 
> receive this for the addressee, you must not use, copy, 
> disclose, or take any action based on this message or any 
> information herein. If you have received this message in 
> error, please advise the sender immediately by reply e-mail 
> and delete this message. Thank you for your cooperation."
> 
> 
> -----Original Message-----
> From: The IBM z/VM Operating System 
> [mailto:ib...@listserv.uark.edu] On Behalf Of Lee Stewart
> Sent: Tuesday, September 15, 2009 8:39 AM
> To: IBMVM@LISTSERV.UARK.EDU 
> Subject: [IBMVM] VM lockup due to storage typo
> 
> Does anyone have an idea of how we might have gotten out of 
> this without an IPL?
> 
> VM LPAR has 175G of memory and a flock of Linux Oracle guests... 
> Several guests needed more memory added so the directory was 
> updated and one by one the guests shutdown, logged off and 
> back on.  So far, so good.
> 
> But... In changing the memory for many guests, and it being 
> late at night after a long day, while meaning to set a 
> guest's memory to 9728M, it got set to 9728G.  When that 
> guest was cycled we see the message on the console that it's 
> memory was limited to 8TB (HCPLGN093E), then the VM system 
> appeared to freeze.
> 
> We couldn't get in via TCP/IP, or the HMC Operating System 
> Messages screen, or the HMC Integrated 3270.
> 
> Finally had to IPL.   Even that was wierd as I'd have 
> expected the Load 
> Normal to shutdown, it just IPLed.   We did NoAutolog, fixed the typo 
> and all came back up ok...
> 
> I suspect CP was scrambling paging everything in the world 
> out as Linux 
> tried to initialize that 8TB of memory...   But I'm surprised 
> I couldn't 
> even get into the HMC consoles (to kill just that one guest 
> as opposed to all of them)..
> 
> Any thoughts?
> Lee
> -- 
> 
> Lee Stewart, Senior SE
> Sirius Computer Solutions
> Phone: (303) 996-7122
> Email: lee.stew...@siriuscom.com 
> Web:   www.siriuscom.com 
>

Re: VM lockup due to storage typo

Reply via email to