Set swappiness to 0. Can you just start 1 node as a test? Ray
> -----Original Message----- > From: Linux on 390 Port [mailto:[email protected]] On > Behalf Of Daniel Tate > Sent: Monday, July 26, 2010 2:28 PM > To: [email protected] > Subject: Re: OOM Condition on SLES11 running WAS - Tuning problems? > > Yeah, i saw that.. problem is these same apps run on 16GB of mem on a > windows box.. > > We have 28 JVMs and sizes are set to 50/256. > > On Mon, Jul 26, 2010 at 11:07 AM, Marcy Cortes < > [email protected]> wrote: > > > First of all, you've run out of memory on that server > (Swap: 35764956k > > total, 35764956k used,) > > It ate all of the 10G and all of the 35G of swap. > > How many JVM's are running and what are their min/max heap sizes? > > > > > > > > Marcy > > > > “This message may contain confidential and/or privileged > information. If > > you are not the addressee or authorized to receive this for > the addressee, > > you must not use, copy, disclose, or take any action based > on this message > > or any information herein. If you have received this > message in error, > > please advise the sender immediately by reply e-mail and delete this > > message. Thank you for your cooperation." > > > > > > -----Original Message----- > > From: Linux on 390 Port [mailto:[email protected]] On > Behalf Of > > Daniel Tate > > Sent: Monday, July 26, 2010 8:24 AM > > To: [email protected] > > Subject: [LINUX-390] OOM Condition on SLES11 running WAS - > Tuning problems? > > > > We're running websphere on a z9 under z/VM 4 systems are > live out of 8. > > it > > is running apps that consume around 16GB of memory on a > Windows machine. > > on > > this, we have allocated 10G of real storage (RAM) and around 35GB of > > Swap. When websphere starts, it consumes all the memory > eventually and > > halts, but not panics, the system. We are running > 64-Bit. I'm a z/VM > > novice so i don't know much to do.. > > > > Here is some information from our WAS Admin: > > "We are running WebSphere 6.1.0.25 with FP > EJB3.0,Webservices and Web 2.0 > > installed. There are two nodes running 14 application > servers each. there > > are currently 32 applications installed but not currently > running. No > > security has been enabled for WebSphere at this time." > > > > > > At this point i see two problems: > > > > 1) Why is OOM Kill not functioning properly > > 2) Why is websphere performance so awful? > > > > and have two questions > > > > 1) Does anyone have any PRACTICAL experience/tips to > optimize SLES11 on > > z/VM? So far we've been using dated case studies and > redbooks that seem to > > be filled with inaccuracies or outdated information. > > 2) Is there any way to force a coredump via the cp, like > you can with the > > magic sysrq? > > > > All systems are running the same release and patch level: > > > > [root] bwzld001:~# lsb_release -a > > LSB Version: > > > > > core-2.0-noarch:core-3.2-noarch:core-4.0-noarch:core-2.0-s390x :core-3.2-s390x:core-4.0-s390x:desktop-4.0-noarch:desktop-4.0-> s390:desktop-4.0-s390x:graphics-2.0-noarch:graphics-2.0-s390:g raphics-2.0-s390x:graphics-3.2-noarch:graphics-3.2-> s390:graphics-3.2-s390x:graphics-4.0-noarch:graphics-4.0-s390: graphics-4.0-s390x > > Distributor ID: SUSE LINUX > > Description: SUSE Linux Enterprise Server 11 (s390x) > > Release: 11 > > Codename: n/a > > > > > > Here is a partial top shortly before system death: > > > > top - 08:13:14 up 2 days, 16:08, 2 users, load average: > 51.47, 22.20, > > 10.25 > > Tasks: 129 total, 4 running, 125 sleeping, 0 stopped, 0 zombie > > Cpu(s): 16.7%us, 81.5%sy, 0.0%ni, 0.0%id, 0.0%wa, > 0.3%hi, 0.3%si, > > 1.2%st > > Mem: 10268344k total, 10220568k used, 47776k free, > 548k buffers > > Swap: 35764956k total, 35764956k used, 0k free, > 56340k cached > > > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ > > COMMAND > > > > 26850 wasadmin 20 0 1506m 253m 2860 S 18 2.5 16:06.28 > > java > > 29870 wasadmin 20 0 1497m 279m 2560 S 15 2.8 15:41.13 > > java > > 24607 wasadmin 20 0 1502m 223m 2760 S 13 2.2 16:15.14 > > java > > 24641 wasadmin 20 0 7229m 1.3g 3172 S 13 13.1 196:35.52 > > java > > 26606 wasadmin 20 0 1438m 272m 6212 S 12 2.7 16:02.77 > > java > > 27600 wasadmin 20 0 1553m 258m 2920 S 12 2.6 15:46.57 > > java > > 24638 wasadmin 20 0 7368m 1.3g 24m S 10 13.7 206:02.05 > > java > > 25609 wasadmin 20 0 1528m 219m 2540 S 9 2.2 16:07.33 > > java > > 30258 wasadmin 20 0 1515m 249m 2592 S 7 2.5 15:49.79 > > java > > 25780 wasadmin 20 0 1604m 277m 2332 S 6 2.8 16:31.41 > > java > > 27106 wasadmin 20 0 1458m 273m 2472 S 6 2.7 15:59.13 > > java > > 27336 wasadmin 20 0 1528m 238m 2540 S 5 2.4 15:38.82 > > java > > 29164 wasadmin 20 0 1527m 224m 2608 S 5 2.2 16:02.56 > > java > > 31400 wasadmin 20 0 1509m 259m 2468 S 5 2.6 15:26.38 > > java > > 25244 wasadmin 20 0 1509m 290m 2624 S 5 2.9 16:16.07 > > java > > 24769 wasadmin 20 0 1409m 259m 2308 S 5 2.6 16:08.12 > > java > > 28796 wasadmin 20 0 1338m 263m 3076 S 4 2.6 15:47.72 > > java > > 26185 wasadmin 20 0 1493m 274m 2304 S 2 2.7 16:01.97 > > java > > 25968 wasadmin 20 0 1427m 257m 2532 S 1 2.6 15:51.50 > > java > > 29495 wasadmin 20 0 1466m 259m 2260 S 1 2.6 15:31.82 > > java > > 25080 wasadmin 20 0 1445m 236m 2472 S 0 2.4 15:53.19 > > java > > 26410 wasadmin 20 0 1475m 271m 2540 S 0 2.7 15:52.48 > > java > > 31027 wasadmin 20 0 1413m 238m 2492 S 0 2.4 15:29.78 > > java > > 3695 wasadmin 20 0 9968 1352 1352 S 0 0.0 0:00.13 > > bash > > 24474 wasadmin 20 0 1468m 205m 2472 S 0 2.0 16:03.63 > > java > > 24920 wasadmin 20 0 1522m 263m 2616 S 0 2.6 16:06.29 > > java > > 25422 wasadmin 20 0 1584m 229m 2284 S 0 2.3 16:02.18 > > java > > 27892 wasadmin 20 0 1414m 263m 2648 S 0 2.6 15:45.96 > > java > > 28184 wasadmin 20 0 1523m 241m 2320 S 0 2.4 15:42.21 > > java > > 28486 wasadmin 20 0 1450m 231m 2288 S 0 2.3 15:46.53 > > java > > 30625 wasadmin 20 0 1477m 251m 3024 S 0 2.5 15:44.80 java > > > > ----------------- > > > > > > Here are a few screen grabs from the 3720 Console session: > > > > Unless you get a _continuous_flood_ of these messages it means > > everything is working fine. Allocations from irqs cannot be > > perfectly reliable and the kernel is designed to handle that. > > java: page allocation failure. order:0, mode:0x20, alloc_flags:0x7, > > pflags:0x400 > > 040 > > CPU: 1 Not tainted 2.6.27.45-0.1-default #1 > > Process java (pid: 28831, task: 00000001ab64c638, ksp: > 0000000215bbb5e0) > > 0000000000000000 000000027fbcf7b0 0000000000000002 0000000000000000 > > 000000027fbcf850 000000027fbcf7c8 000000027fbcf7c8 > 00000000003b6696 > > 00000000014a4e88 0000000000000007 0000000000634e00 > 0000000000000000 > > 000000000000000d 0000000000000000 000000027fbcf818 > 000000000000000e > > 00000000003cdc00 000000000010521a 000000027fbcf7b0 > 000000027fbcf7f8 > > Call Trace: > > ( 0000000000105174> show_trace+0x130/0x134) > > 000000000019890a> __alloc_pages_internal+0x406/0x55c > > 00000000001c7056> cache_grow+0x382/0x458 > > 00000000001c7440> cache_alloc_refill+0x314/0x36c > > 00000000001c6c12> kmem_cache_alloc+0x82/0x144 > > 00000000003228f2> __alloc_skb+0x82/0x208 > > 000000000032378e> dev_alloc_skb+0x36/0x64 > > 000003e0001a030e> qeth_core_get_next_skb+0x31e/0x704 eth > > 000003e0000d5f8c> > qeth_l3_process_inbound_buffer+0x9c/0x598 eth_l3 > > 000003e0000d6574> qeth_l3_qdio_input_handler+0xec/0x268 eth_l3 > > 000003e0000ebc44> qdio_kick_inbound_handler+0xbc/0x178 dio > > 000003e0000ee58c> __tiqdio_inbound_processing+0x394/0xdf4 dio > > 000000000013a800> tasklet_action+0x10c/0x1e4 > > 000000000013b908> __do_softirq+0xe0/0x1c8 > > 0000000000110252> do_softirq+0xaa/0xb0 > > 000000000013b772> irq_exit+0xc2/0xcc > > 00000000002f6586> do_IRQ+0x132/0x1c8 > > 0000000000114148> io_return+0x0/0x8 > > 00000000002b850e> _raw_spin_lock_wait+0x86/0xa4 > > ( 000003e047d6fa00> 0x3e047d6fa00) > > 000000000019eb9c> shrink_page_list+0x1a0/0x584 > > 000000000019f184> shrink_inactive_list+0x204/0x5b0 > > 000000000019f620> shrink_zone+0xf0/0x1d0 > > 000000000019f882> shrink_zones+0xae/0x184 > > 00000000001a02be> do_try_to_free_pages+0x96/0x3fc > > 00000000001a072c> try_to_free_pages+0x74/0x7c > > 0000000000198730> __alloc_pages_internal+0x22c/0x55c > > 000000000019b5a2> __do_page_cache_readahead+0x10a/0x2ac > > 000000000019b7cc> do_page_cache_readahead+0x88/0xa8 > > 000000000019170e> filemap_fault+0x33a/0x448 > > 00000000001a55bc> __do_fault+0x78/0x580 > > 00000000001a962e> handle_mm_fault+0x1e6/0x4c0 > > 00000000003b9e1e> do_dat_exception+0x29e/0x388 > > 0000000000113c0c> sysc_return+0x0/0x8 > > 0000020000214bde> 0x20000214bde > > Mem-Info: > > DMA per-cpu: > > CPU 0: hi: 186, btch: 31 usd: 0 > > CPU 1: hi: 186, btch: 31 usd: 0 > > Normal per-cpu: > > CPU 0: hi: 186, btch: 31 usd: 0 > > CPU 1: hi: 186, btch: 31 usd: 0 > > Active:1355277 inactive:1132712 dirty:0 writeback:0 unstable:0 > > free:9269 slab:17875 mapped:765 pagetables:24402 bounce:0 > > DMA free:33220kB min:2568kB low:3208kB high:3852kB active:1092112kB > > inactive:926 > > 924kB present:2064384kB pages_scanned:21132286 all_unreclaimable? no > > lowmem_reserveݨ: 0 8064 8064 > > Normal free:3856kB min:10276kB low:12844kB high:15412kB > active:4328996kB > > inactiv > > e:3603924kB present:8257536kB pages_scanned:44557906 > all_unreclaimable? yes > > lowmem_reserveݨ: 0 0 0 > > DMA: 101*4kB 32*8kB 473*16kB 195*32kB 49*64kB 30*128kB > 8*256kB 3*512kB > > 8*1024kB > > = 33220kB > > Normal: 0*4kB 0*8kB 1*16kB 0*32kB 0*64kB 0*128kB 1*256kB > 1*512kB 3*1024kB = > > 3856 > > kB > > 9283 total pagecache pages > > 0 pages in swap cache > > Swap cache stats: add 34513958, delete 34513958, find > 6612011/8393146 > > Free swap = 0kB > > Total swap = 35764956kB > > 2621440 pages RAM > > 54354 pages reserved > > 22356 pages shared > > 2538214 pages non-shared > > The following is only an harmless informational message. > > Unless you get a _continuous_flood_ of these messages it means > > everything is working fine. Allocations from irqs cannot be > > perfectly reliable and the kernel is designed to handle that. > > java: page allocation failure. order:0, mode:0x20, alloc_flags:0x7, > > pflags:0x400 > > 040 > > CPU: 1 Not tainted 2.6.27.45-0.1-default #1 > > Process java (pid: 28831, task: 00000001ab64c638, ksp: > 0000000215bbb5e0) > > 0000000000000000 000000027fbcf7b0 0000000000000002 0000000000000000 > > 000000027fbcf850 000000027fbcf7c8 000000027fbcf7c8 > 00000000003b6696 > > 00000000014a5dd3 0000000000000007 0000000000634e00 > 0000000000000000 > > 000000000000000d 0000000000000000 000000027fbcf818 > 000000000000000e > > 00000000003cdc00 000000000010521a 000000027fbcf7b0 > 000000027fbcf7f8 > > Call Trace: > > ( 0000000000105174> show_trace+0x130/0x134) > > 000000000019890a> __alloc_pages_internal+0x406/0x55c > > 00000000001c7056> cache_grow+0x382/0x458 > > 00000000001c7440> cache_alloc_refill+0x314/0x36c > > 00000000001c6c12> kmem_cache_alloc+0x82/0x144 > > 00000000003228f2> __alloc_skb+0x82/0x208 > > 000000000032378e> dev_alloc_skb+0x36/0x64 > > 000003e0001a030e> qeth_core_get_next_skb+0x31e/0x704 eth > > 000003e0000d5f8c> > qeth_l3_process_inbound_buffer+0x9c/0x598 eth_l3 > > 000003e0000d6574> qeth_l3_qdio_input_handler+0xec/0x268 eth_l3 > > 000003e0000ebc44> qdio_kick_inbound_handler+0xbc/0x178 dio > > 000003e0000ee58c> __tiqdio_inbound_processing+0x394/0xdf4 dio > > 000000000013a800> tasklet_action+0x10c/0x1e4 > > 000000000013b908> __do_softirq+0xe0/0x1c8 > > 0000000000110252> do_softirq+0xaa/0xb0 > > 000000000013b772> irq_exit+0xc2/0xcc > > 00000000002f6586> do_IRQ+0x132/0x1c8 > > 0000000000114148> io_return+0x0/0x8 > > 00000000002b850e> _raw_spin_lock_wait+0x86/0xa4 > > ( 000003e047d6fa00> 0x3e047d6fa00) > > 000000000019eb9c> shrink_page_list+0x1a0/0x584 > > 000000000019f184> shrink_inactive_list+0x204/0x5b0 > > 000000000019f620> shrink_zone+0xf0/0x1d0 > > 000000000019f882> shrink_zones+0xae/0x184 > > 00000000001a02be> do_try_to_free_pages+0x96/0x3fc > > 00000000001a072c> try_to_free_pages+0x74/0x7c > > 0000000000198730> __alloc_pages_internal+0x22c/0x55c > > 000000000019b5a2> __do_page_cache_readahead+0x10a/0x2ac > > 000000000019b7cc> do_page_cache_readahead+0x88/0xa8 > > 000000000019170e> filemap_fault+0x33a/0x448 > > 00000000001a55bc> __do_fault+0x78/0x580 > > 00000000001a962e> handle_mm_fault+0x1e6/0x4c0 > > 00000000003b9e1e> do_dat_exception+0x29e/0x388 > > 0000000000113c0c> sysc_return+0x0/0x8 > > 0000020000214bde> 0x20000214bde > > Mem-Info: > > DMA per-cpu: > > CPU 0: hi: 186, btch: 31 usd: 0 > > CPU 1: hi: 186, btch: 31 usd: 0 > > Normal per-cpu: > > CPU 0: hi: 186, btch: 31 usd: 0 > > CPU 1: hi: 186, btch: 31 usd: 0 > > Active:1355277 inactive:1132712 dirty:0 writeback:0 unstable:0 > > free:9269 slab:17875 mapped:765 pagetables:24402 bounce:0 > > DMA free:33220kB min:2568kB low:3208kB high:3852kB active:1092112kB > > inactive:926 > > 924kB present:2064384kB pages_scanned:21132286 all_unreclaimable? no > > lowmem_reserveݨ: 0 8064 8064 > > Normal free:3856kB min:10276kB low:12844kB high:15412kB > active:4328996kB > > inactiv > > e:3603924kB present:8257536kB pages_scanned:44557906 > all_unreclaimable? yes > > lowmem_reserveݨ: 0 0 0 > > DMA: 101*4kB 32*8kB 473*16kB 195*32kB 49*64kB 30*128kB > 8*256kB 3*512kB > > 8*1024kB > > = 33220kB > > Normal: 0*4kB 0*8kB 1*16kB 0*32kB 0*64kB 0*128kB 1*256kB > 1*512kB 3*1024kB = > > 3856 > > kB > > 9283 total pagecache pages > > 0 pages in swap cache > > Swap cache stats: add 34513958, delete 34513958, find > 6612011/8393146 > > Free swap = 0kB > > Total swap = 35764956kB > > 2621440 pages RAM > > 54354 pages reserved > > 22356 pages shared > > 2538214 pages non-shared > > __ratelimit: 4 callbacks suppressed > > The following is only an harmless informational message. > > Unless you get a _continuous_flood_ of these messages it means > > everything is working fine. Allocations from irqs cannot be > > perfectly reliable and the kernel is designed to handle that. > > java: page allocation failure. order:0, mode:0x20, alloc_flags:0x7, > > pflags:0x400 > > 040 > > CPU: 1 Not tainted 2.6.27.45-0.1-default #1 > > Process java (pid: 28831, task: 00000001ab64c638, ksp: > 0000000215bbb5e0) > > 0000000000000000 000000027fbcf7b0 0000000000000002 0000000000000000 > > 000000027fbcf850 000000027fbcf7c8 000000027fbcf7c8 > 00000000003b6696 > > * > > etc, etc for HUNDREDS of pages..* > > > > > ---------------------------------------------------------------------- > > For LINUX-390 subscribe / signoff / archive access instructions, > > send email to [email protected] with the message: INFO > LINUX-390 or > > visit > > http://www.marist.edu/htbin/wlvindex?LINUX-390 > > > ---------------------------------------------------------------------- > > For more information on Linux on System z, visit > > http://wiki.linuxvm.org/ > > > > ---------------------------------------------------------------------- > For LINUX-390 subscribe / signoff / archive access instructions, > send email to [email protected] with the message: INFO > LINUX-390 or visit > http://www.marist.edu/htbin/wlvindex?LINUX-390 > ---------------------------------------------------------------------- > For more information on Linux on System z, visit > http://wiki.linuxvm.org/ >
