First of all, you've run out of memory on that server (Swap: 35764956k total, 35764956k used,) It ate all of the 10G and all of the 35G of swap. How many JVM's are running and what are their min/max heap sizes?
Marcy “This message may contain confidential and/or privileged information. If you are not the addressee or authorized to receive this for the addressee, you must not use, copy, disclose, or take any action based on this message or any information herein. If you have received this message in error, please advise the sender immediately by reply e-mail and delete this message. Thank you for your cooperation." -----Original Message----- From: Linux on 390 Port [mailto:[email protected]] On Behalf Of Daniel Tate Sent: Monday, July 26, 2010 8:24 AM To: [email protected] Subject: [LINUX-390] OOM Condition on SLES11 running WAS - Tuning problems? We're running websphere on a z9 under z/VM 4 systems are live out of 8. it is running apps that consume around 16GB of memory on a Windows machine. on this, we have allocated 10G of real storage (RAM) and around 35GB of Swap. When websphere starts, it consumes all the memory eventually and halts, but not panics, the system. We are running 64-Bit. I'm a z/VM novice so i don't know much to do.. Here is some information from our WAS Admin: "We are running WebSphere 6.1.0.25 with FP EJB3.0,Webservices and Web 2.0 installed. There are two nodes running 14 application servers each. there are currently 32 applications installed but not currently running. No security has been enabled for WebSphere at this time." At this point i see two problems: 1) Why is OOM Kill not functioning properly 2) Why is websphere performance so awful? and have two questions 1) Does anyone have any PRACTICAL experience/tips to optimize SLES11 on z/VM? So far we've been using dated case studies and redbooks that seem to be filled with inaccuracies or outdated information. 2) Is there any way to force a coredump via the cp, like you can with the magic sysrq? All systems are running the same release and patch level: [root] bwzld001:~# lsb_release -a LSB Version: core-2.0-noarch:core-3.2-noarch:core-4.0-noarch:core-2.0-s390x:core-3.2-s390x:core-4.0-s390x:desktop-4.0-noarch:desktop-4.0-s390:desktop-4.0-s390x:graphics-2.0-noarch:graphics-2.0-s390:graphics-2.0-s390x:graphics-3.2-noarch:graphics-3.2-s390:graphics-3.2-s390x:graphics-4.0-noarch:graphics-4.0-s390:graphics-4.0-s390x Distributor ID: SUSE LINUX Description: SUSE Linux Enterprise Server 11 (s390x) Release: 11 Codename: n/a Here is a partial top shortly before system death: top - 08:13:14 up 2 days, 16:08, 2 users, load average: 51.47, 22.20, 10.25 Tasks: 129 total, 4 running, 125 sleeping, 0 stopped, 0 zombie Cpu(s): 16.7%us, 81.5%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.3%hi, 0.3%si, 1.2%st Mem: 10268344k total, 10220568k used, 47776k free, 548k buffers Swap: 35764956k total, 35764956k used, 0k free, 56340k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 26850 wasadmin 20 0 1506m 253m 2860 S 18 2.5 16:06.28 java 29870 wasadmin 20 0 1497m 279m 2560 S 15 2.8 15:41.13 java 24607 wasadmin 20 0 1502m 223m 2760 S 13 2.2 16:15.14 java 24641 wasadmin 20 0 7229m 1.3g 3172 S 13 13.1 196:35.52 java 26606 wasadmin 20 0 1438m 272m 6212 S 12 2.7 16:02.77 java 27600 wasadmin 20 0 1553m 258m 2920 S 12 2.6 15:46.57 java 24638 wasadmin 20 0 7368m 1.3g 24m S 10 13.7 206:02.05 java 25609 wasadmin 20 0 1528m 219m 2540 S 9 2.2 16:07.33 java 30258 wasadmin 20 0 1515m 249m 2592 S 7 2.5 15:49.79 java 25780 wasadmin 20 0 1604m 277m 2332 S 6 2.8 16:31.41 java 27106 wasadmin 20 0 1458m 273m 2472 S 6 2.7 15:59.13 java 27336 wasadmin 20 0 1528m 238m 2540 S 5 2.4 15:38.82 java 29164 wasadmin 20 0 1527m 224m 2608 S 5 2.2 16:02.56 java 31400 wasadmin 20 0 1509m 259m 2468 S 5 2.6 15:26.38 java 25244 wasadmin 20 0 1509m 290m 2624 S 5 2.9 16:16.07 java 24769 wasadmin 20 0 1409m 259m 2308 S 5 2.6 16:08.12 java 28796 wasadmin 20 0 1338m 263m 3076 S 4 2.6 15:47.72 java 26185 wasadmin 20 0 1493m 274m 2304 S 2 2.7 16:01.97 java 25968 wasadmin 20 0 1427m 257m 2532 S 1 2.6 15:51.50 java 29495 wasadmin 20 0 1466m 259m 2260 S 1 2.6 15:31.82 java 25080 wasadmin 20 0 1445m 236m 2472 S 0 2.4 15:53.19 java 26410 wasadmin 20 0 1475m 271m 2540 S 0 2.7 15:52.48 java 31027 wasadmin 20 0 1413m 238m 2492 S 0 2.4 15:29.78 java 3695 wasadmin 20 0 9968 1352 1352 S 0 0.0 0:00.13 bash 24474 wasadmin 20 0 1468m 205m 2472 S 0 2.0 16:03.63 java 24920 wasadmin 20 0 1522m 263m 2616 S 0 2.6 16:06.29 java 25422 wasadmin 20 0 1584m 229m 2284 S 0 2.3 16:02.18 java 27892 wasadmin 20 0 1414m 263m 2648 S 0 2.6 15:45.96 java 28184 wasadmin 20 0 1523m 241m 2320 S 0 2.4 15:42.21 java 28486 wasadmin 20 0 1450m 231m 2288 S 0 2.3 15:46.53 java 30625 wasadmin 20 0 1477m 251m 3024 S 0 2.5 15:44.80 java ----------------- Here are a few screen grabs from the 3720 Console session: Unless you get a _continuous_flood_ of these messages it means everything is working fine. Allocations from irqs cannot be perfectly reliable and the kernel is designed to handle that. java: page allocation failure. order:0, mode:0x20, alloc_flags:0x7, pflags:0x400 040 CPU: 1 Not tainted 2.6.27.45-0.1-default #1 Process java (pid: 28831, task: 00000001ab64c638, ksp: 0000000215bbb5e0) 0000000000000000 000000027fbcf7b0 0000000000000002 0000000000000000 000000027fbcf850 000000027fbcf7c8 000000027fbcf7c8 00000000003b6696 00000000014a4e88 0000000000000007 0000000000634e00 0000000000000000 000000000000000d 0000000000000000 000000027fbcf818 000000000000000e 00000000003cdc00 000000000010521a 000000027fbcf7b0 000000027fbcf7f8 Call Trace: ( 0000000000105174> show_trace+0x130/0x134) 000000000019890a> __alloc_pages_internal+0x406/0x55c 00000000001c7056> cache_grow+0x382/0x458 00000000001c7440> cache_alloc_refill+0x314/0x36c 00000000001c6c12> kmem_cache_alloc+0x82/0x144 00000000003228f2> __alloc_skb+0x82/0x208 000000000032378e> dev_alloc_skb+0x36/0x64 000003e0001a030e> qeth_core_get_next_skb+0x31e/0x704 eth 000003e0000d5f8c> qeth_l3_process_inbound_buffer+0x9c/0x598 eth_l3 000003e0000d6574> qeth_l3_qdio_input_handler+0xec/0x268 eth_l3 000003e0000ebc44> qdio_kick_inbound_handler+0xbc/0x178 dio 000003e0000ee58c> __tiqdio_inbound_processing+0x394/0xdf4 dio 000000000013a800> tasklet_action+0x10c/0x1e4 000000000013b908> __do_softirq+0xe0/0x1c8 0000000000110252> do_softirq+0xaa/0xb0 000000000013b772> irq_exit+0xc2/0xcc 00000000002f6586> do_IRQ+0x132/0x1c8 0000000000114148> io_return+0x0/0x8 00000000002b850e> _raw_spin_lock_wait+0x86/0xa4 ( 000003e047d6fa00> 0x3e047d6fa00) 000000000019eb9c> shrink_page_list+0x1a0/0x584 000000000019f184> shrink_inactive_list+0x204/0x5b0 000000000019f620> shrink_zone+0xf0/0x1d0 000000000019f882> shrink_zones+0xae/0x184 00000000001a02be> do_try_to_free_pages+0x96/0x3fc 00000000001a072c> try_to_free_pages+0x74/0x7c 0000000000198730> __alloc_pages_internal+0x22c/0x55c 000000000019b5a2> __do_page_cache_readahead+0x10a/0x2ac 000000000019b7cc> do_page_cache_readahead+0x88/0xa8 000000000019170e> filemap_fault+0x33a/0x448 00000000001a55bc> __do_fault+0x78/0x580 00000000001a962e> handle_mm_fault+0x1e6/0x4c0 00000000003b9e1e> do_dat_exception+0x29e/0x388 0000000000113c0c> sysc_return+0x0/0x8 0000020000214bde> 0x20000214bde Mem-Info: DMA per-cpu: CPU 0: hi: 186, btch: 31 usd: 0 CPU 1: hi: 186, btch: 31 usd: 0 Normal per-cpu: CPU 0: hi: 186, btch: 31 usd: 0 CPU 1: hi: 186, btch: 31 usd: 0 Active:1355277 inactive:1132712 dirty:0 writeback:0 unstable:0 free:9269 slab:17875 mapped:765 pagetables:24402 bounce:0 DMA free:33220kB min:2568kB low:3208kB high:3852kB active:1092112kB inactive:926 924kB present:2064384kB pages_scanned:21132286 all_unreclaimable? no lowmem_reserveݨ: 0 8064 8064 Normal free:3856kB min:10276kB low:12844kB high:15412kB active:4328996kB inactiv e:3603924kB present:8257536kB pages_scanned:44557906 all_unreclaimable? yes lowmem_reserveݨ: 0 0 0 DMA: 101*4kB 32*8kB 473*16kB 195*32kB 49*64kB 30*128kB 8*256kB 3*512kB 8*1024kB = 33220kB Normal: 0*4kB 0*8kB 1*16kB 0*32kB 0*64kB 0*128kB 1*256kB 1*512kB 3*1024kB = 3856 kB 9283 total pagecache pages 0 pages in swap cache Swap cache stats: add 34513958, delete 34513958, find 6612011/8393146 Free swap = 0kB Total swap = 35764956kB 2621440 pages RAM 54354 pages reserved 22356 pages shared 2538214 pages non-shared The following is only an harmless informational message. Unless you get a _continuous_flood_ of these messages it means everything is working fine. Allocations from irqs cannot be perfectly reliable and the kernel is designed to handle that. java: page allocation failure. order:0, mode:0x20, alloc_flags:0x7, pflags:0x400 040 CPU: 1 Not tainted 2.6.27.45-0.1-default #1 Process java (pid: 28831, task: 00000001ab64c638, ksp: 0000000215bbb5e0) 0000000000000000 000000027fbcf7b0 0000000000000002 0000000000000000 000000027fbcf850 000000027fbcf7c8 000000027fbcf7c8 00000000003b6696 00000000014a5dd3 0000000000000007 0000000000634e00 0000000000000000 000000000000000d 0000000000000000 000000027fbcf818 000000000000000e 00000000003cdc00 000000000010521a 000000027fbcf7b0 000000027fbcf7f8 Call Trace: ( 0000000000105174> show_trace+0x130/0x134) 000000000019890a> __alloc_pages_internal+0x406/0x55c 00000000001c7056> cache_grow+0x382/0x458 00000000001c7440> cache_alloc_refill+0x314/0x36c 00000000001c6c12> kmem_cache_alloc+0x82/0x144 00000000003228f2> __alloc_skb+0x82/0x208 000000000032378e> dev_alloc_skb+0x36/0x64 000003e0001a030e> qeth_core_get_next_skb+0x31e/0x704 eth 000003e0000d5f8c> qeth_l3_process_inbound_buffer+0x9c/0x598 eth_l3 000003e0000d6574> qeth_l3_qdio_input_handler+0xec/0x268 eth_l3 000003e0000ebc44> qdio_kick_inbound_handler+0xbc/0x178 dio 000003e0000ee58c> __tiqdio_inbound_processing+0x394/0xdf4 dio 000000000013a800> tasklet_action+0x10c/0x1e4 000000000013b908> __do_softirq+0xe0/0x1c8 0000000000110252> do_softirq+0xaa/0xb0 000000000013b772> irq_exit+0xc2/0xcc 00000000002f6586> do_IRQ+0x132/0x1c8 0000000000114148> io_return+0x0/0x8 00000000002b850e> _raw_spin_lock_wait+0x86/0xa4 ( 000003e047d6fa00> 0x3e047d6fa00) 000000000019eb9c> shrink_page_list+0x1a0/0x584 000000000019f184> shrink_inactive_list+0x204/0x5b0 000000000019f620> shrink_zone+0xf0/0x1d0 000000000019f882> shrink_zones+0xae/0x184 00000000001a02be> do_try_to_free_pages+0x96/0x3fc 00000000001a072c> try_to_free_pages+0x74/0x7c 0000000000198730> __alloc_pages_internal+0x22c/0x55c 000000000019b5a2> __do_page_cache_readahead+0x10a/0x2ac 000000000019b7cc> do_page_cache_readahead+0x88/0xa8 000000000019170e> filemap_fault+0x33a/0x448 00000000001a55bc> __do_fault+0x78/0x580 00000000001a962e> handle_mm_fault+0x1e6/0x4c0 00000000003b9e1e> do_dat_exception+0x29e/0x388 0000000000113c0c> sysc_return+0x0/0x8 0000020000214bde> 0x20000214bde Mem-Info: DMA per-cpu: CPU 0: hi: 186, btch: 31 usd: 0 CPU 1: hi: 186, btch: 31 usd: 0 Normal per-cpu: CPU 0: hi: 186, btch: 31 usd: 0 CPU 1: hi: 186, btch: 31 usd: 0 Active:1355277 inactive:1132712 dirty:0 writeback:0 unstable:0 free:9269 slab:17875 mapped:765 pagetables:24402 bounce:0 DMA free:33220kB min:2568kB low:3208kB high:3852kB active:1092112kB inactive:926 924kB present:2064384kB pages_scanned:21132286 all_unreclaimable? no lowmem_reserveݨ: 0 8064 8064 Normal free:3856kB min:10276kB low:12844kB high:15412kB active:4328996kB inactiv e:3603924kB present:8257536kB pages_scanned:44557906 all_unreclaimable? yes lowmem_reserveݨ: 0 0 0 DMA: 101*4kB 32*8kB 473*16kB 195*32kB 49*64kB 30*128kB 8*256kB 3*512kB 8*1024kB = 33220kB Normal: 0*4kB 0*8kB 1*16kB 0*32kB 0*64kB 0*128kB 1*256kB 1*512kB 3*1024kB = 3856 kB 9283 total pagecache pages 0 pages in swap cache Swap cache stats: add 34513958, delete 34513958, find 6612011/8393146 Free swap = 0kB Total swap = 35764956kB 2621440 pages RAM 54354 pages reserved 22356 pages shared 2538214 pages non-shared __ratelimit: 4 callbacks suppressed The following is only an harmless informational message. Unless you get a _continuous_flood_ of these messages it means everything is working fine. Allocations from irqs cannot be perfectly reliable and the kernel is designed to handle that. java: page allocation failure. order:0, mode:0x20, alloc_flags:0x7, pflags:0x400 040 CPU: 1 Not tainted 2.6.27.45-0.1-default #1 Process java (pid: 28831, task: 00000001ab64c638, ksp: 0000000215bbb5e0) 0000000000000000 000000027fbcf7b0 0000000000000002 0000000000000000 000000027fbcf850 000000027fbcf7c8 000000027fbcf7c8 00000000003b6696 * etc, etc for HUNDREDS of pages..* ---------------------------------------------------------------------- For LINUX-390 subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 ---------------------------------------------------------------------- For more information on Linux on System z, visit http://wiki.linuxvm.org/
