Hi,
We’re experiencing a strange behaviour of our paging sub system and have big
difficulties to solve the problem – hope some of you are able to help us.
Our VM system pages extremely slowly: a long inactive Linux guest is paged in
from DASD by z/VM with approximately 1 MB/sec in an otherwise almost idle
system. The overall system page rate is 400-800 pages/second – half for reading
pages and half for writing pages. We have 12 paging disks (3390-9) distributed
over 2 LCUs and attached with 6 FICON channels – neither disks nor channels are
in any way a bottleneck according to our performance readings.
At the same time the main storage is badly utilized: Performance Toolkit
reports a storage utilization of 65-75% and it reports strange values for
“Total real Storage” and “Total available” storage: namely 0kb.
What can be wrong?
Kind regards,
Klaus Johansen
Additional information:
We have a z/VM system with approximately 25 zPenguins in an LPAR with 2 IFLs,
12GB main storage and 3GB expanded storage. We have overcommitted storage 2:1.
We are fully aware that the storage size for the Linux guest should be
minimized as much as possible. Linux uses all available memory for file cache
etc. - We have surely seen that z/VM pages file cache out on to the paging
volumes, making allocation of memory slow. We have not considered this a
problem since the guests are rarely “active” at the same time (the memory
accessed by active Linuxes should easily fit in main storage). But we didn’t
know that VM pages with 1MB/sec…
The “User Wait States” screen in Performance Toolkit confirms that the guests
are in page wait (97-100%) during these “1MB/sec. page-ins”.
The system is actually capable of paging faster: When all linuxes are paging at
the same time (haven’t had time to reschedule daily cron job for log rotation)
we see a momentarily page rate at 7000pages/sec. – much better but nothing
extraordinary.
We have considered that our SRM LDUBUF, SRM STOBUF etc. needs tuning but
according the “Performance book” these values mostly affect the scheduler and
dispatcher in relation to the Q1-Q3-queues – and there seems to be no problems
entering the dispatch list (furthermore: it has no effect to enable “quick
dispatch” for a slow paging guest).
Storage Utilization
Interval 15:39:59-15:40:59, on 2007/12/21 (CURRENT interval, select average
for mean data)
Main storage utilization: XSTORE utilization:
Total real storage 0kB Total available 0kB
Total available 0kB Att. to virt. machines 0kB
Offline storage frames .......kB Size of CP partition 3'072MB
SYSGEN storage size .......kB CP XSTORE utilization 99%
CP resident nucleus .......kB Low threshold for migr. 1'680kB
Shared storage 117'377MB XSTORE allocation rate 0/s
FREE storage pages .......kB Average age of XSTORE blks 2855s
FREE stor. subpools 52'976kB Average age at migration ...s
Subpool stor. utilization 0%
Total DPA size 12'132MB MDCACHE utilization:
Locked pages 37'548kB Min. size in XSTORE 0kB
Trace table .......kB Max. size in XSTORE 3'072MB
Pageable 12'095MB Ideal size in XSTORE 77'692kB
Storage utilization 86% Act. size in XSTORE 88'788kB
Tasks waiting for a frame 3 Bias for XSTORE 1.00
Tasks waiting for a page 4/s Min. size in main stor. 0kB
Max. size in main stor. 12'288MB
V=R area: Ideal size in main stor. 1'265MB
Size defined ...kB Act. size in main stor. 234'228kB
FREE storage ...kB Bias for main stor. 1.00
V=R recovery area in use ...% MDCACHE limit / user 32'544kB
V=R user ........ Users with MDCACHE inserts 2
MDISK cache read rate 2/s
Paging / spooling activity: MDISK cache write rate ...../s
Page moves <2GB for trans. 0/s MDISK cache read hit rate 2/s
Fast path page-in rate 89/s MDISK cache read hit ratio 76%
Long path page-in rate 1/s
Long path page-out rate 0/s VDISKs:
Page read rate 179/s System limit (blocks) Unlim.
Page write rate 0/s User limit (blocks) Unlim.
Page read blocking factor 6 Main store page frames 11312
Page write blocking factor ... Expanded stor. pages 302
Migrate-out blocking factor ... Pages on DASD 426964
Paging SSCH rate 20/s
SPOOL read rate 0/s
SPOOL write rate 0/s
CP owned disks: 1 out of 12 paging-disk during “1MB/sec paging in”
Interval 15:37:04-15:37:05, on 2007/12/21
Detailed Analysis for Device 9F01 ( CP OWNED )
Device type : 3390-9 Function pend.: .2ms Device busy : 12%
VOLSER : VSPPG7 Disconnected : .0ms I/O contention: 0%
Nr. of LINKs: 0 Connected : 3.9ms Reserved : 0%
Last SEEK : 1525 Service time : 4.1ms SENSE SSCH : 0
SSCH rate/s : 30.0 Response time : 4.1ms Recovery SSCH : 0
Avoided/s : .0 CU queue time : .0ms Throttle del/s: ...
Status: ONLINE
System Page/Spool I/O Details
Page reads/s : 10.0 Total pages/s : 87.5 PG serv. time: 1.7ms
Page writes/s : 77.5 I/Os avoided/s : .0 PG resp. time: 1.7ms
Spool reads/s : .0 System I/Os /s : 87.5 PG queue len.: 1.07
Spool writes/s: .0 User interfer./s: .0 Avail. bsize : 1
Path(s) to device 9F01: 61 65 78 79 6C 6F
Channel path status : ON ON ON ON ON ON
Device Overall CU-Cache Performance Split
DIR ADDR VOLSER IO/S %READ %RDHIT %WRHIT ICL/S BYP/S IO/S %READ %RDHIT
16 9F01 VSPPG7 5.3 15 25 100 .0 .0 5.3 15 25 (N)
.0 0 0 (S)
.0 0 0 (F)
MDISK Extent Userid Addr IO/s VSEEK Status LINK MDIO/s
+-------------------------------------------------------------------------+
! 1 - 10016 System PAGE RD/s WR/s MLOAD Used IO/S !
! LOAD ====> 10.0 77.5 1.7 20% 87.5 !
+-------------------------------------------------------------------------+
--
Psssst! Schon vom neuen GMX MultiMessenger gehört?
Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger?did=10