I should point out that this hang is likely being misunderstood here. While this scenario will indeed drive paging over the edge, that's not
likely what happened. If paging had been driven to that point, the system would have quickly taken a PGT004 abend and restarted. Instead, I believe what happened is likely a most difficult to solve variant on something that was mentioned before: that is, difficulty allocating CP structures required to represent the massive amount of storage. Page tables are only part of the problem. The upper level DAT tables (region and segment) can be up to 4 frames long, and once storage utilization becomes heavy enough, it becomes fragmented (PGMBK allocation being a factor here), making it very difficult for CP to allocate contiguous sets of 3s and 4s. We spent quite a bit of effort in z/VM 5.3.0 addressing the PGMBK side of this issue, but the harder problem of the upper level tables remains as a likely constraint point. Occurrences of this sort of problem are likely to result in temporary or permanent hangs of both individual users and eventually the entire system, which supports the theory in this case. I'd really need to see a dump of the system in question to confirm this hypothesis, however. Bill Holder z/VM Development, Memory Management team lead, IBM