Re: VM lockup due to storage typo

Quay, Jonathan (IHG) Thu, 17 Sep 2009 10:00:23 -0700

It sounds very similar in symptom to my minidisk cache overcommitment
problem that resulted in CP thrashing (and an APAR).

-----Original Message-----
From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On
Behalf Of Bill Holder
Sent: Thursday, September 17, 2009 12:34 PM
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: VM lockup due to storage typo

I should point out that this hang is likely being misunderstood here.  =

While this scenario will indeed drive paging over the edge, that's not =

likely what happened.  If paging had been driven to that point, the 
system would have quickly taken a PGT004 abend and restarted.  Instead,
=

I believe what happened is likely a most difficult to solve variant on
something that was mentioned before: that is, difficulty allocating CP
structures required to represent the massive amount of storage.  Page 
tables are only part of the problem.  The upper level DAT tables (region
=

and segment) can be up to 4 frames long, and once storage utilization 
becomes heavy enough, it becomes fragmented (PGMBK allocation being 
a factor here), making it very difficult for CP to allocate contiguous =

sets of 3s and 4s.  We spent quite a bit of effort in z/VM 5.3.0 
addressing the PGMBK side of this issue, but the harder problem of 
the upper level tables remains as a likely constraint point.  

Occurrences of this sort of problem are likely to result in temporary 
or permanent hangs of both individual users and eventually the entire 
system, which supports the theory in this case.  I'd really need to 
see a dump of the system in question to confirm this hypothesis, 
however.  

Bill Holder
z/VM Development, Memory Management team lead, IBM

Re: VM lockup due to storage typo

Reply via email to