Can you post the results of Q SRM
Q ALLOC PAGE Marcy “This message may contain confidential and/or privileged information. If you are not the addressee or authorized to receive this for the addressee, you must not use, copy, disclose, or take any action based on this message or any information herein. If you have received this message in error, please advise the sender immediately by reply e-mail and delete this message. Thank you for your cooperation." ________________________________ From: The IBM z/VM Operating System [mailto:[email protected]] On Behalf Of Daniel Tate Sent: Monday, July 26, 2010 8:31 AM To: [email protected] Subject: [IBMVM] Linux on Z/VM running WAS problems - anyone got any tips? I apologize for this not being a "direct" z/VM question. I've posted to the Linux-390 group to get the linux POV.. but exploring all angles here I am attempting to find out if there's anything i can set/do from z/VM that would help the situation.. I'd like it to "finish" the scroll, not sure how to do that except tape down control-C (i'm using c3720). a CP Q ALL is at the bottom of all this mess.. The e-mail sent to Linux-s390 for reference: We're running websphere on a z9 under z/VM 4 systems are live out of 8. it is running apps that consume around 16GB of memory on a Windows machine. on this, we have allocated 10G of real storage (RAM) and around 35GB of Swap. When websphere starts, it consumes all the memory eventually and halts, but not panics, the system. We are running 64-Bit. I'm a z/VM novice so i don't know much to do.. Here is some information from our WAS Admin: "We are running WebSphere 6.1.0.25 with FP EJB3.0,Webservices and Web 2.0 installed. There are two nodes running 14 application servers each. there are currently 32 applications installed but not currently running. No security has been enabled for WebSphere at this time." At this point i see two problems: 1) Why is OOM Kill not functioning properly 2) Why is websphere performance so awful? and have two questions 1) Does anyone have any PRACTICAL experience/tips to optimize SLES11 on z/VM? So far we've been using dated case studies and redbooks that seem to be filled with inaccuracies or outdated information. 2) Is there any way to force a coredump via the cp, like you can with the magic sysrq? All systems are running the same release and patch level: [root] bwzld001:~# lsb_release -a LSB Version: core-2.0-noarch:core-3.2-noarch:core-4.0-noarch:core-2.0-s390x:core-3.2-s390x:core-4.0-s390x:desktop-4.0-noarch:desktop-4.0-s390:desktop-4.0-s390x:graphics-2.0-noarch:graphics-2.0-s390:graphics-2.0-s390x:graphics-3.2-noarch:graphics-3.2-s390:graphics-3.2-s390x:graphics-4.0-noarch:graphics-4.0-s390:graphics-4.0-s390x Distributor ID: SUSE LINUX Description: SUSE Linux Enterprise Server 11 (s390x) Release: 11 Codename: n/a Here is a partial top shortly before system death: top - 08:13:14 up 2 days, 16:08, 2 users, load average: 51.47, 22.20, 10.25 Tasks: 129 total, 4 running, 125 sleeping, 0 stopped, 0 zombie Cpu(s): 16.7%us, 81.5%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.3%hi, 0.3%si, 1.2%st Mem: 10268344k total, 10220568k used, 47776k free, 548k buffers Swap: 35764956k total, 35764956k used, 0k free, 56340k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 26850 wasadmin 20 0 1506m 253m 2860 S 18 2.5 16:06.28 java 29870 wasadmin 20 0 1497m 279m 2560 S 15 2.8 15:41.13 java 24607 wasadmin 20 0 1502m 223m 2760 S 13 2.2 16:15.14 java 24641 wasadmin 20 0 7229m 1.3g 3172 S 13 13.1 196:35.52 java 26606 wasadmin 20 0 1438m 272m 6212 S 12 2.7 16:02.77 java 27600 wasadmin 20 0 1553m 258m 2920 S 12 2.6 15:46.57 java 24638 wasadmin 20 0 7368m 1.3g 24m S 10 13.7 206:02.05 java 25609 wasadmin 20 0 1528m 219m 2540 S 9 2.2 16:07.33 java 30258 wasadmin 20 0 1515m 249m 2592 S 7 2.5 15:49.79 java 25780 wasadmin 20 0 1604m 277m 2332 S 6 2.8 16:31.41 java 27106 wasadmin 20 0 1458m 273m 2472 S 6 2.7 15:59.13 java 27336 wasadmin 20 0 1528m 238m 2540 S 5 2.4 15:38.82 java 29164 wasadmin 20 0 1527m 224m 2608 S 5 2.2 16:02.56 java 31400 wasadmin 20 0 1509m 259m 2468 S 5 2.6 15:26.38 java 25244 wasadmin 20 0 1509m 290m 2624 S 5 2.9 16:16.07 java 24769 wasadmin 20 0 1409m 259m 2308 S 5 2.6 16:08.12 java 28796 wasadmin 20 0 1338m 263m 3076 S 4 2.6 15:47.72 java 26185 wasadmin 20 0 1493m 274m 2304 S 2 2.7 16:01.97 java 25968 wasadmin 20 0 1427m 257m 2532 S 1 2.6 15:51.50 java 29495 wasadmin 20 0 1466m 259m 2260 S 1 2.6 15:31.82 java 25080 wasadmin 20 0 1445m 236m 2472 S 0 2.4 15:53.19 java 26410 wasadmin 20 0 1475m 271m 2540 S 0 2.7 15:52.48 java 31027 wasadmin 20 0 1413m 238m 2492 S 0 2.4 15:29.78 java 3695 wasadmin 20 0 9968 1352 1352 S 0 0.0 0:00.13 bash 24474 wasadmin 20 0 1468m 205m 2472 S 0 2.0 16:03.63 java 24920 wasadmin 20 0 1522m 263m 2616 S 0 2.6 16:06.29 java 25422 wasadmin 20 0 1584m 229m 2284 S 0 2.3 16:02.18 java 27892 wasadmin 20 0 1414m 263m 2648 S 0 2.6 15:45.96 java 28184 wasadmin 20 0 1523m 241m 2320 S 0 2.4 15:42.21 java 28486 wasadmin 20 0 1450m 231m 2288 S 0 2.3 15:46.53 java 30625 wasadmin 20 0 1477m 251m 3024 S 0 2.5 15:44.80 java ----------------- Here are a few screen grabs from the 3720 Console session: Unless you get a _continuous_flood_ of these messages it means everything is working fine. Allocations from irqs cannot be perfectly reliable and the kernel is designed to handle that. java: page allocation failure. order:0, mode:0x20, alloc_flags:0x7, pflags:0x400 040 CPU: 1 Not tainted 2.6.27.45-0.1-default #1 Process java (pid: 28831, task: 00000001ab64c638, ksp: 0000000215bbb5e0) 0000000000000000 000000027fbcf7b0 0000000000000002 0000000000000000 000000027fbcf850 000000027fbcf7c8 000000027fbcf7c8 00000000003b6696 00000000014a4e88 0000000000000007 0000000000634e00 0000000000000000 000000000000000d 0000000000000000 000000027fbcf818 000000000000000e 00000000003cdc00 000000000010521a 000000027fbcf7b0 000000027fbcf7f8 Call Trace: ( 0000000000105174> show_trace+0x130/0x134) 000000000019890a> __alloc_pages_internal+0x406/0x55c 00000000001c7056> cache_grow+0x382/0x458 00000000001c7440> cache_alloc_refill+0x314/0x36c 00000000001c6c12> kmem_cache_alloc+0x82/0x144 00000000003228f2> __alloc_skb+0x82/0x208 000000000032378e> dev_alloc_skb+0x36/0x64 000003e0001a030e> qeth_core_get_next_skb+0x31e/0x704 eth 000003e0000d5f8c> qeth_l3_process_inbound_buffer+0x9c/0x598 eth_l3 000003e0000d6574> qeth_l3_qdio_input_handler+0xec/0x268 eth_l3 000003e0000ebc44> qdio_kick_inbound_handler+0xbc/0x178 dio 000003e0000ee58c> __tiqdio_inbound_processing+0x394/0xdf4 dio 000000000013a800> tasklet_action+0x10c/0x1e4 000000000013b908> __do_softirq+0xe0/0x1c8 0000000000110252> do_softirq+0xaa/0xb0 000000000013b772> irq_exit+0xc2/0xcc 00000000002f6586> do_IRQ+0x132/0x1c8 0000000000114148> io_return+0x0/0x8 00000000002b850e> _raw_spin_lock_wait+0x86/0xa4 ( 000003e047d6fa00> 0x3e047d6fa00) 000000000019eb9c> shrink_page_list+0x1a0/0x584 000000000019f184> shrink_inactive_list+0x204/0x5b0 000000000019f620> shrink_zone+0xf0/0x1d0 000000000019f882> shrink_zones+0xae/0x184 00000000001a02be> do_try_to_free_pages+0x96/0x3fc 00000000001a072c> try_to_free_pages+0x74/0x7c 0000000000198730> __alloc_pages_internal+0x22c/0x55c 000000000019b5a2> __do_page_cache_readahead+0x10a/0x2ac 000000000019b7cc> do_page_cache_readahead+0x88/0xa8 000000000019170e> filemap_fault+0x33a/0x448 00000000001a55bc> __do_fault+0x78/0x580 00000000001a962e> handle_mm_fault+0x1e6/0x4c0 00000000003b9e1e> do_dat_exception+0x29e/0x388 0000000000113c0c> sysc_return+0x0/0x8 0000020000214bde> 0x20000214bde Mem-Info: DMA per-cpu: CPU 0: hi: 186, btch: 31 usd: 0 CPU 1: hi: 186, btch: 31 usd: 0 Normal per-cpu: CPU 0: hi: 186, btch: 31 usd: 0 CPU 1: hi: 186, btch: 31 usd: 0 Active:1355277 inactive:1132712 dirty:0 writeback:0 unstable:0 free:9269 slab:17875 mapped:765 pagetables:24402 bounce:0 DMA free:33220kB min:2568kB low:3208kB high:3852kB active:1092112kB inactive:926 924kB present:2064384kB pages_scanned:21132286 all_unreclaimable? no lowmem_reserveݨ: 0 8064 8064 Normal free:3856kB min:10276kB low:12844kB high:15412kB active:4328996kB inactiv e:3603924kB present:8257536kB pages_scanned:44557906 all_unreclaimable? yes lowmem_reserveݨ: 0 0 0 DMA: 101*4kB 32*8kB 473*16kB 195*32kB 49*64kB 30*128kB 8*256kB 3*512kB 8*1024kB = 33220kB Normal: 0*4kB 0*8kB 1*16kB 0*32kB 0*64kB 0*128kB 1*256kB 1*512kB 3*1024kB = 3856 kB 9283 total pagecache pages 0 pages in swap cache Swap cache stats: add 34513958, delete 34513958, find 6612011/8393146 Free swap = 0kB Total swap = 35764956kB 2621440 pages RAM 54354 pages reserved 22356 pages shared 2538214 pages non-shared The following is only an harmless informational message. Unless you get a _continuous_flood_ of these messages it means everything is working fine. Allocations from irqs cannot be perfectly reliable and the kernel is designed to handle that. java: page allocation failure. order:0, mode:0x20, alloc_flags:0x7, pflags:0x400 040 CPU: 1 Not tainted 2.6.27.45-0.1-default #1 Process java (pid: 28831, task: 00000001ab64c638, ksp: 0000000215bbb5e0) 0000000000000000 000000027fbcf7b0 0000000000000002 0000000000000000 000000027fbcf850 000000027fbcf7c8 000000027fbcf7c8 00000000003b6696 00000000014a5dd3 0000000000000007 0000000000634e00 0000000000000000 000000000000000d 0000000000000000 000000027fbcf818 000000000000000e 00000000003cdc00 000000000010521a 000000027fbcf7b0 000000027fbcf7f8 Call Trace: ( 0000000000105174> show_trace+0x130/0x134) 000000000019890a> __alloc_pages_internal+0x406/0x55c 00000000001c7056> cache_grow+0x382/0x458 00000000001c7440> cache_alloc_refill+0x314/0x36c 00000000001c6c12> kmem_cache_alloc+0x82/0x144 00000000003228f2> __alloc_skb+0x82/0x208 000000000032378e> dev_alloc_skb+0x36/0x64 000003e0001a030e> qeth_core_get_next_skb+0x31e/0x704 eth 000003e0000d5f8c> qeth_l3_process_inbound_buffer+0x9c/0x598 eth_l3 000003e0000d6574> qeth_l3_qdio_input_handler+0xec/0x268 eth_l3 000003e0000ebc44> qdio_kick_inbound_handler+0xbc/0x178 dio 000003e0000ee58c> __tiqdio_inbound_processing+0x394/0xdf4 dio 000000000013a800> tasklet_action+0x10c/0x1e4 000000000013b908> __do_softirq+0xe0/0x1c8 0000000000110252> do_softirq+0xaa/0xb0 000000000013b772> irq_exit+0xc2/0xcc 00000000002f6586> do_IRQ+0x132/0x1c8 0000000000114148> io_return+0x0/0x8 00000000002b850e> _raw_spin_lock_wait+0x86/0xa4 ( 000003e047d6fa00> 0x3e047d6fa00) 000000000019eb9c> shrink_page_list+0x1a0/0x584 000000000019f184> shrink_inactive_list+0x204/0x5b0 000000000019f620> shrink_zone+0xf0/0x1d0 000000000019f882> shrink_zones+0xae/0x184 00000000001a02be> do_try_to_free_pages+0x96/0x3fc 00000000001a072c> try_to_free_pages+0x74/0x7c 0000000000198730> __alloc_pages_internal+0x22c/0x55c 000000000019b5a2> __do_page_cache_readahead+0x10a/0x2ac 000000000019b7cc> do_page_cache_readahead+0x88/0xa8 000000000019170e> filemap_fault+0x33a/0x448 00000000001a55bc> __do_fault+0x78/0x580 00000000001a962e> handle_mm_fault+0x1e6/0x4c0 00000000003b9e1e> do_dat_exception+0x29e/0x388 0000000000113c0c> sysc_return+0x0/0x8 0000020000214bde> 0x20000214bde Mem-Info: DMA per-cpu: CPU 0: hi: 186, btch: 31 usd: 0 CPU 1: hi: 186, btch: 31 usd: 0 Normal per-cpu: CPU 0: hi: 186, btch: 31 usd: 0 CPU 1: hi: 186, btch: 31 usd: 0 Active:1355277 inactive:1132712 dirty:0 writeback:0 unstable:0 free:9269 slab:17875 mapped:765 pagetables:24402 bounce:0 DMA free:33220kB min:2568kB low:3208kB high:3852kB active:1092112kB inactive:926 924kB present:2064384kB pages_scanned:21132286 all_unreclaimable? no lowmem_reserveݨ: 0 8064 8064 Normal free:3856kB min:10276kB low:12844kB high:15412kB active:4328996kB inactiv e:3603924kB present:8257536kB pages_scanned:44557906 all_unreclaimable? yes lowmem_reserveݨ: 0 0 0 DMA: 101*4kB 32*8kB 473*16kB 195*32kB 49*64kB 30*128kB 8*256kB 3*512kB 8*1024kB = 33220kB Normal: 0*4kB 0*8kB 1*16kB 0*32kB 0*64kB 0*128kB 1*256kB 1*512kB 3*1024kB = 3856 kB 9283 total pagecache pages 0 pages in swap cache Swap cache stats: add 34513958, delete 34513958, find 6612011/8393146 Free swap = 0kB Total swap = 35764956kB 2621440 pages RAM 54354 pages reserved 22356 pages shared 2538214 pages non-shared __ratelimit: 4 callbacks suppressed The following is only an harmless informational message. Unless you get a _continuous_flood_ of these messages it means everything is working fine. Allocations from irqs cannot be perfectly reliable and the kernel is designed to handle that. java: page allocation failure. order:0, mode:0x20, alloc_flags:0x7, pflags:0x400 040 CPU: 1 Not tainted 2.6.27.45-0.1-default #1 Process java (pid: 28831, task: 00000001ab64c638, ksp: 0000000215bbb5e0) 0000000000000000 000000027fbcf7b0 0000000000000002 0000000000000000 000000027fbcf850 000000027fbcf7c8 000000027fbcf7c8 00000000003b6696 etc, etc for HUNDREDS of pages.. perhaps infinite. 00: 00: CP Q ALL 00: STORAGE = 15G CONFIGURED = 15G INC = 64M STANDBY = 0 RESERVED = 0 00: OSA 039C ATTACHED TO TCPIP 039C DEVTYPE OSA CHPID 01 OSD 00: OSA 039D ATTACHED TO TCPIP 039D DEVTYPE OSA CHPID 01 OSD 00: OSA 039E ATTACHED TO TCPIP 039E DEVTYPE OSA CHPID 01 OSD 00: OSA 03A0 ATTACHED TO DTCVSW2 03A0 DEVTYPE OSA CHPID 01 OSD 00: OSA 03A1 ATTACHED TO DTCVSW2 03A1 DEVTYPE OSA CHPID 01 OSD 00: OSA 03A2 ATTACHED TO DTCVSW2 03A2 DEVTYPE OSA CHPID 01 OSD 00: OSA 03C0 ATTACHED TO DTCVSW1 03C0 DEVTYPE OSA CHPID 02 OSD 00: OSA 03C1 ATTACHED TO DTCVSW1 03C1 DEVTYPE OSA CHPID 02 OSD 00: OSA 03C2 ATTACHED TO DTCVSW1 03C2 DEVTYPE OSA CHPID 02 OSD 00: FCP 5000 ATTACHED TO LINXDEV 5000 CHPID 46 00: WWPN C05076FAE3000400 00: FCP 5001 ATTACHED TO LINXD001 5001 CHPID 46 00: WWPN C05076FAE3000404 00: FCP 5002 ATTACHED TO LINXD002 5002 CHPID 46 00: WWPN C05076FAE3000408 00: FCP 5003 ATTACHED TO LINXD003 5003 CHPID 46 00: WWPN C05076FAE300040C 00: FCP 5100 ATTACHED TO LINXDEV 5100 CHPID 47 00: WWPN C05076FAE3000900 00: FCP 5101 ATTACHED TO LINXD001 5101 CHPID 47 00: WWPN C05076FAE3000904 00: FCP 5102 ATTACHED TO LINXD002 5102 CHPID 47 00: WWPN C05076FAE3000908 00: FCP 5103 ATTACHED TO LINXD003 5103 CHPID 47 00: WWPN C05076FAE300090C 00: DASD 9F7D CP SYSTEM VM6LXD 0 00: DASD 9F7E CP SYSTEM VM6LXE 0 00: DASD 9F80 CP SYSTEM VM6LX9 2 00: DASD 9F81 CP SYSTEM VM6LXA 2 00: DASD 9F82 CP SYSTEM VM6LXB 0 00: DASD 9F83 CP SYSTEM VM6LXC 0 00: DASD 9F84 CP OWNED VM6RES 135 00: DASD 9F85 CP OWNED VM6SPL 0 00: DASD 9F86 CP OWNED VM6PG1 0 00: DASD 9F87 CP OWNED VM6PG2 0 00: DASD 9F88 CP OWNED VM6LX1 4 00: DASD 9F89 CP SYSTEM VM6LX2 0 00: DASD 9F8A CP SYSTEM VM6LX3 0 00: DASD 9F8B CP SYSTEM VM6LX4 0 00: DASD 9F8C CP SYSTEM VM6LX5 2 00: DASD 9F8D CP SYSTEM VM6LX6 0 00: DASD 9F8E CP SYSTEM VM6LX7 0 00: DASD 9F8F CP SYSTEM VM6LX8 2 00: DASD 9FC7 CP SYSTEM VM6LX6 0 00: DASD 9FC8 CP SYSTEM VM6LX5 2 00: DASD 9FC9 CP SYSTEM VM6LX2 0 00: DASD 9FCA CP SYSTEM VM6LX4 0 00: DASD 9FCB CP SYSTEM VM6LX3 0 00: DASD 9FCE CP SYSTEM VM6LX1 4 00: DASD 9FCF CP SYSTEM VM6PG2 0 00: DASD 9FD0 CP SYSTEM VM6PG1 0 00: DASD 9FD1 CP SYSTEM VM6SPL 0 00: DASD 9FD2 CP SYSTEM VM6RES 135
