Re: Linux on Z/VM running WAS problems - anyone got any tips?

Marcy Cortes Mon, 26 Jul 2010 09:14:32 -0700

Can you post the results of

Q SRM


Q ALLOC PAGE




Marcy

“This message may contain confidential and/or privileged information. If you 
are not the addressee or authorized to receive this for the addressee, you must 
not use, copy, disclose, or take any action based on this message or any 
information herein. If you have received this message in error, please advise 
the sender immediately by reply e-mail and delete this message. Thank you for 
your cooperation."



________________________________

From: The IBM z/VM Operating System [mailto:[email protected]] On Behalf 
Of Daniel Tate
Sent: Monday, July 26, 2010 8:31 AM
To: [email protected]
Subject: [IBMVM] Linux on Z/VM running WAS problems - anyone got any tips?


I apologize for this not being a "direct" z/VM question.

I've posted to the Linux-390 group to get the linux POV.. but exploring all 
angles here I am attempting to find out if there's anything i can set/do from 
z/VM that would help the situation..  I'd like it to "finish" the scroll, not 
sure how to do that except tape down control-C (i'm using c3720).  a CP Q ALL 
is at the bottom of all this mess..

The e-mail sent to Linux-s390 for reference:

We're running websphere on a z9 under z/VM 4 systems are live out of 8.   it is 
running apps that consume around 16GB of memory on a Windows machine.  on this, 
we have allocated 10G of real storage (RAM) and around 35GB of Swap.    When 
websphere starts, it consumes all the memory eventually and halts, but not 
panics, the system.    We are running 64-Bit.  I'm a z/VM novice so i don't 
know much to do..

Here is some information from our WAS Admin:
"We are running WebSphere 6.1.0.25 with FP EJB3.0,Webservices and Web 2.0 
installed.  There are two nodes running 14 application servers each. there are 
currently 32 applications installed but not currently running.  No security has 
been enabled for WebSphere at this time."


At this point i see two problems:

1) Why is OOM Kill not functioning properly
2) Why is websphere performance so awful?

and have two questions

1) Does anyone have any PRACTICAL experience/tips to optimize SLES11 on z/VM?  
So far we've been using dated case studies and redbooks that seem to be filled 
with inaccuracies or outdated information.
2) Is there any way to force a coredump via the cp, like you can with the magic 
sysrq?

All systems are running the same release and patch level:

[root] bwzld001:~# lsb_release -a
LSB Version:    
core-2.0-noarch:core-3.2-noarch:core-4.0-noarch:core-2.0-s390x:core-3.2-s390x:core-4.0-s390x:desktop-4.0-noarch:desktop-4.0-s390:desktop-4.0-s390x:graphics-2.0-noarch:graphics-2.0-s390:graphics-2.0-s390x:graphics-3.2-noarch:graphics-3.2-s390:graphics-3.2-s390x:graphics-4.0-noarch:graphics-4.0-s390:graphics-4.0-s390x
Distributor ID:    SUSE LINUX
Description:    SUSE Linux Enterprise Server 11 (s390x)
Release:    11
Codename:    n/a


Here is a partial top shortly before system death:

top - 08:13:14 up 2 days, 16:08,  2 users,  load average: 51.47, 22.20, 10.25
Tasks: 129 total,   4 running, 125 sleeping,   0 stopped,   0 zombie
Cpu(s): 16.7%us, 81.5%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.3%hi,  0.3%si,  1.2%st
Mem:  10268344k total, 10220568k used,    47776k free,      548k buffers
Swap: 35764956k total, 35764956k used,        0k free,    56340k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND

26850 wasadmin  20   0 1506m 253m 2860 S   18  2.5  16:06.28 java
29870 wasadmin  20   0 1497m 279m 2560 S   15  2.8  15:41.13 java
24607 wasadmin  20   0 1502m 223m 2760 S   13  2.2  16:15.14 java
24641 wasadmin  20   0 7229m 1.3g 3172 S   13 13.1 196:35.52 java
26606 wasadmin  20   0 1438m 272m 6212 S   12  2.7  16:02.77 java
27600 wasadmin  20   0 1553m 258m 2920 S   12  2.6  15:46.57 java
24638 wasadmin  20   0 7368m 1.3g  24m S   10 13.7 206:02.05 java
25609 wasadmin  20   0 1528m 219m 2540 S    9  2.2  16:07.33 java
30258 wasadmin  20   0 1515m 249m 2592 S    7  2.5  15:49.79 java
25780 wasadmin  20   0 1604m 277m 2332 S    6  2.8  16:31.41 java
27106 wasadmin  20   0 1458m 273m 2472 S    6  2.7  15:59.13 java
27336 wasadmin  20   0 1528m 238m 2540 S    5  2.4  15:38.82 java
29164 wasadmin  20   0 1527m 224m 2608 S    5  2.2  16:02.56 java
31400 wasadmin  20   0 1509m 259m 2468 S    5  2.6  15:26.38 java
25244 wasadmin  20   0 1509m 290m 2624 S    5  2.9  16:16.07 java
24769 wasadmin  20   0 1409m 259m 2308 S    5  2.6  16:08.12 java
28796 wasadmin  20   0 1338m 263m 3076 S    4  2.6  15:47.72 java
26185 wasadmin  20   0 1493m 274m 2304 S    2  2.7  16:01.97 java
25968 wasadmin  20   0 1427m 257m 2532 S    1  2.6  15:51.50 java
29495 wasadmin  20   0 1466m 259m 2260 S    1  2.6  15:31.82 java
25080 wasadmin  20   0 1445m 236m 2472 S    0  2.4  15:53.19 java
26410 wasadmin  20   0 1475m 271m 2540 S    0  2.7  15:52.48 java
31027 wasadmin  20   0 1413m 238m 2492 S    0  2.4  15:29.78 java
 3695 wasadmin  20   0  9968 1352 1352 S    0  0.0   0:00.13 bash
24474 wasadmin  20   0 1468m 205m 2472 S    0  2.0  16:03.63 java
24920 wasadmin  20   0 1522m 263m 2616 S    0  2.6  16:06.29 java
25422 wasadmin  20   0 1584m 229m 2284 S    0  2.3  16:02.18 java
27892 wasadmin  20   0 1414m 263m 2648 S    0  2.6  15:45.96 java
28184 wasadmin  20   0 1523m 241m 2320 S    0  2.4  15:42.21 java
28486 wasadmin  20   0 1450m 231m 2288 S    0  2.3  15:46.53 java
30625 wasadmin  20   0 1477m 251m 3024 S    0  2.5  15:44.80 java

-----------------


Here are a few screen grabs from the 3720 Console session:

Unless you get a _continuous_flood_ of these messages it means
everything is working fine. Allocations from irqs cannot be
perfectly reliable and the kernel is designed to handle that.
java: page allocation failure. order:0, mode:0x20, alloc_flags:0x7, pflags:0x400
040
CPU: 1 Not tainted 2.6.27.45-0.1-default #1
Process java (pid: 28831, task: 00000001ab64c638, ksp: 0000000215bbb5e0)
0000000000000000 000000027fbcf7b0 0000000000000002 0000000000000000
       000000027fbcf850 000000027fbcf7c8 000000027fbcf7c8 00000000003b6696
       00000000014a4e88 0000000000000007 0000000000634e00 0000000000000000
       000000000000000d 0000000000000000 000000027fbcf818 000000000000000e
       00000000003cdc00 000000000010521a 000000027fbcf7b0 000000027fbcf7f8
Call Trace:
( 0000000000105174>  show_trace+0x130/0x134)
  000000000019890a>  __alloc_pages_internal+0x406/0x55c
  00000000001c7056>  cache_grow+0x382/0x458
  00000000001c7440>  cache_alloc_refill+0x314/0x36c
  00000000001c6c12>  kmem_cache_alloc+0x82/0x144
  00000000003228f2>  __alloc_skb+0x82/0x208
  000000000032378e>  dev_alloc_skb+0x36/0x64
  000003e0001a030e>  qeth_core_get_next_skb+0x31e/0x704  eth
  000003e0000d5f8c>  qeth_l3_process_inbound_buffer+0x9c/0x598  eth_l3
  000003e0000d6574>  qeth_l3_qdio_input_handler+0xec/0x268  eth_l3
  000003e0000ebc44>  qdio_kick_inbound_handler+0xbc/0x178  dio
  000003e0000ee58c>  __tiqdio_inbound_processing+0x394/0xdf4  dio
  000000000013a800>  tasklet_action+0x10c/0x1e4
  000000000013b908>  __do_softirq+0xe0/0x1c8
  0000000000110252>  do_softirq+0xaa/0xb0
  000000000013b772>  irq_exit+0xc2/0xcc
  00000000002f6586>  do_IRQ+0x132/0x1c8
  0000000000114148>  io_return+0x0/0x8
  00000000002b850e>  _raw_spin_lock_wait+0x86/0xa4
( 000003e047d6fa00>  0x3e047d6fa00)
  000000000019eb9c>  shrink_page_list+0x1a0/0x584
  000000000019f184>  shrink_inactive_list+0x204/0x5b0
  000000000019f620>  shrink_zone+0xf0/0x1d0
  000000000019f882>  shrink_zones+0xae/0x184
  00000000001a02be>  do_try_to_free_pages+0x96/0x3fc
  00000000001a072c>  try_to_free_pages+0x74/0x7c
  0000000000198730>  __alloc_pages_internal+0x22c/0x55c
  000000000019b5a2>  __do_page_cache_readahead+0x10a/0x2ac
  000000000019b7cc>  do_page_cache_readahead+0x88/0xa8
  000000000019170e>  filemap_fault+0x33a/0x448
  00000000001a55bc>  __do_fault+0x78/0x580
  00000000001a962e>  handle_mm_fault+0x1e6/0x4c0
  00000000003b9e1e>  do_dat_exception+0x29e/0x388
  0000000000113c0c>  sysc_return+0x0/0x8
  0000020000214bde>  0x20000214bde
Mem-Info:
DMA per-cpu:
CPU    0: hi:  186, btch:  31 usd:   0
CPU    1: hi:  186, btch:  31 usd:   0
Normal per-cpu:
CPU    0: hi:  186, btch:  31 usd:   0
CPU    1: hi:  186, btch:  31 usd:   0
Active:1355277 inactive:1132712 dirty:0 writeback:0 unstable:0
 free:9269 slab:17875 mapped:765 pagetables:24402 bounce:0
DMA free:33220kB min:2568kB low:3208kB high:3852kB active:1092112kB inactive:926
924kB present:2064384kB pages_scanned:21132286 all_unreclaimable? no
lowmem_reserveݨ: 0 8064 8064
Normal free:3856kB min:10276kB low:12844kB high:15412kB active:4328996kB inactiv
e:3603924kB present:8257536kB pages_scanned:44557906 all_unreclaimable? yes
lowmem_reserveݨ: 0 0 0
DMA: 101*4kB 32*8kB 473*16kB 195*32kB 49*64kB 30*128kB 8*256kB 3*512kB 8*1024kB
= 33220kB
Normal: 0*4kB 0*8kB 1*16kB 0*32kB 0*64kB 0*128kB 1*256kB 1*512kB 3*1024kB = 3856
kB
9283 total pagecache pages
0 pages in swap cache
Swap cache stats: add 34513958, delete 34513958, find 6612011/8393146
Free swap  = 0kB
Total swap = 35764956kB
2621440 pages RAM
54354 pages reserved
22356 pages shared
2538214 pages non-shared
The following is only an harmless informational message.
Unless you get a _continuous_flood_ of these messages it means
everything is working fine. Allocations from irqs cannot be
perfectly reliable and the kernel is designed to handle that.
java: page allocation failure. order:0, mode:0x20, alloc_flags:0x7, pflags:0x400
040
CPU: 1 Not tainted 2.6.27.45-0.1-default #1
Process java (pid: 28831, task: 00000001ab64c638, ksp: 0000000215bbb5e0)
0000000000000000 000000027fbcf7b0 0000000000000002 0000000000000000
       000000027fbcf850 000000027fbcf7c8 000000027fbcf7c8 00000000003b6696
       00000000014a5dd3 0000000000000007 0000000000634e00 0000000000000000
       000000000000000d 0000000000000000 000000027fbcf818 000000000000000e
       00000000003cdc00 000000000010521a 000000027fbcf7b0 000000027fbcf7f8
Call Trace:
( 0000000000105174>  show_trace+0x130/0x134)
  000000000019890a>  __alloc_pages_internal+0x406/0x55c
  00000000001c7056>  cache_grow+0x382/0x458
  00000000001c7440>  cache_alloc_refill+0x314/0x36c
  00000000001c6c12>  kmem_cache_alloc+0x82/0x144
  00000000003228f2>  __alloc_skb+0x82/0x208
  000000000032378e>  dev_alloc_skb+0x36/0x64
  000003e0001a030e>  qeth_core_get_next_skb+0x31e/0x704  eth
  000003e0000d5f8c>  qeth_l3_process_inbound_buffer+0x9c/0x598  eth_l3
  000003e0000d6574>  qeth_l3_qdio_input_handler+0xec/0x268  eth_l3
  000003e0000ebc44>  qdio_kick_inbound_handler+0xbc/0x178  dio
  000003e0000ee58c>  __tiqdio_inbound_processing+0x394/0xdf4  dio
  000000000013a800>  tasklet_action+0x10c/0x1e4
  000000000013b908>  __do_softirq+0xe0/0x1c8
  0000000000110252>  do_softirq+0xaa/0xb0
  000000000013b772>  irq_exit+0xc2/0xcc
  00000000002f6586>  do_IRQ+0x132/0x1c8
  0000000000114148>  io_return+0x0/0x8
  00000000002b850e>  _raw_spin_lock_wait+0x86/0xa4
( 000003e047d6fa00>  0x3e047d6fa00)
  000000000019eb9c>  shrink_page_list+0x1a0/0x584
  000000000019f184>  shrink_inactive_list+0x204/0x5b0
  000000000019f620>  shrink_zone+0xf0/0x1d0
  000000000019f882>  shrink_zones+0xae/0x184
  00000000001a02be>  do_try_to_free_pages+0x96/0x3fc
  00000000001a072c>  try_to_free_pages+0x74/0x7c
  0000000000198730>  __alloc_pages_internal+0x22c/0x55c
  000000000019b5a2>  __do_page_cache_readahead+0x10a/0x2ac
  000000000019b7cc>  do_page_cache_readahead+0x88/0xa8
  000000000019170e>  filemap_fault+0x33a/0x448
  00000000001a55bc>  __do_fault+0x78/0x580
  00000000001a962e>  handle_mm_fault+0x1e6/0x4c0
  00000000003b9e1e>  do_dat_exception+0x29e/0x388
  0000000000113c0c>  sysc_return+0x0/0x8
  0000020000214bde>  0x20000214bde
Mem-Info:
DMA per-cpu:
CPU    0: hi:  186, btch:  31 usd:   0
CPU    1: hi:  186, btch:  31 usd:   0
Normal per-cpu:
CPU    0: hi:  186, btch:  31 usd:   0
CPU    1: hi:  186, btch:  31 usd:   0
Active:1355277 inactive:1132712 dirty:0 writeback:0 unstable:0
 free:9269 slab:17875 mapped:765 pagetables:24402 bounce:0
DMA free:33220kB min:2568kB low:3208kB high:3852kB active:1092112kB inactive:926
924kB present:2064384kB pages_scanned:21132286 all_unreclaimable? no
lowmem_reserveݨ: 0 8064 8064
Normal free:3856kB min:10276kB low:12844kB high:15412kB active:4328996kB inactiv
e:3603924kB present:8257536kB pages_scanned:44557906 all_unreclaimable? yes
lowmem_reserveݨ: 0 0 0
DMA: 101*4kB 32*8kB 473*16kB 195*32kB 49*64kB 30*128kB 8*256kB 3*512kB 8*1024kB
= 33220kB
Normal: 0*4kB 0*8kB 1*16kB 0*32kB 0*64kB 0*128kB 1*256kB 1*512kB 3*1024kB = 3856
kB
9283 total pagecache pages
0 pages in swap cache
Swap cache stats: add 34513958, delete 34513958, find 6612011/8393146
Free swap  = 0kB
Total swap = 35764956kB
2621440 pages RAM
54354 pages reserved
22356 pages shared
2538214 pages non-shared
__ratelimit: 4 callbacks suppressed
The following is only an harmless informational message.
Unless you get a _continuous_flood_ of these messages it means
everything is working fine. Allocations from irqs cannot be
perfectly reliable and the kernel is designed to handle that.
java: page allocation failure. order:0, mode:0x20, alloc_flags:0x7, pflags:0x400
040
CPU: 1 Not tainted 2.6.27.45-0.1-default #1
Process java (pid: 28831, task: 00000001ab64c638, ksp: 0000000215bbb5e0)
0000000000000000 000000027fbcf7b0 0000000000000002 0000000000000000
       000000027fbcf850 000000027fbcf7c8 000000027fbcf7c8 00000000003b6696

etc, etc for HUNDREDS of pages.. perhaps infinite.

00:
00: CP Q ALL
00: STORAGE = 15G CONFIGURED = 15G INC = 64M STANDBY = 0  RESERVED = 0
00: OSA  039C ATTACHED TO TCPIP    039C DEVTYPE OSA         CHPID 01 OSD
00: OSA  039D ATTACHED TO TCPIP    039D DEVTYPE OSA         CHPID 01 OSD
00: OSA  039E ATTACHED TO TCPIP    039E DEVTYPE OSA         CHPID 01 OSD
00: OSA  03A0 ATTACHED TO DTCVSW2  03A0 DEVTYPE OSA         CHPID 01 OSD
00: OSA  03A1 ATTACHED TO DTCVSW2  03A1 DEVTYPE OSA         CHPID 01 OSD
00: OSA  03A2 ATTACHED TO DTCVSW2  03A2 DEVTYPE OSA         CHPID 01 OSD
00: OSA  03C0 ATTACHED TO DTCVSW1  03C0 DEVTYPE OSA         CHPID 02 OSD
00: OSA  03C1 ATTACHED TO DTCVSW1  03C1 DEVTYPE OSA         CHPID 02 OSD
00: OSA  03C2 ATTACHED TO DTCVSW1  03C2 DEVTYPE OSA         CHPID 02 OSD
00: FCP  5000 ATTACHED TO LINXDEV  5000 CHPID 46
00:      WWPN C05076FAE3000400
00: FCP  5001 ATTACHED TO LINXD001 5001 CHPID 46
00:      WWPN C05076FAE3000404
00: FCP  5002 ATTACHED TO LINXD002 5002 CHPID 46
00:      WWPN C05076FAE3000408
00: FCP  5003 ATTACHED TO LINXD003 5003 CHPID 46
00:      WWPN C05076FAE300040C
00: FCP  5100 ATTACHED TO LINXDEV  5100 CHPID 47
00:      WWPN C05076FAE3000900
00: FCP  5101 ATTACHED TO LINXD001 5101 CHPID 47
00:      WWPN C05076FAE3000904
00: FCP  5102 ATTACHED TO LINXD002 5102 CHPID 47
00:      WWPN C05076FAE3000908
00: FCP  5103 ATTACHED TO LINXD003 5103 CHPID 47
00:      WWPN C05076FAE300090C
00: DASD 9F7D CP SYSTEM VM6LXD   0
00: DASD 9F7E CP SYSTEM VM6LXE   0
00: DASD 9F80 CP SYSTEM VM6LX9   2
00: DASD 9F81 CP SYSTEM VM6LXA   2
00: DASD 9F82 CP SYSTEM VM6LXB   0
00: DASD 9F83 CP SYSTEM VM6LXC   0
00: DASD 9F84 CP OWNED  VM6RES   135
00: DASD 9F85 CP OWNED  VM6SPL   0
00: DASD 9F86 CP OWNED  VM6PG1   0
00: DASD 9F87 CP OWNED  VM6PG2   0
00: DASD 9F88 CP OWNED  VM6LX1   4
00: DASD 9F89 CP SYSTEM VM6LX2   0
00: DASD 9F8A CP SYSTEM VM6LX3   0
00: DASD 9F8B CP SYSTEM VM6LX4   0
00: DASD 9F8C CP SYSTEM VM6LX5   2
00: DASD 9F8D CP SYSTEM VM6LX6   0
00: DASD 9F8E CP SYSTEM VM6LX7   0
00: DASD 9F8F CP SYSTEM VM6LX8   2
00: DASD 9FC7 CP SYSTEM VM6LX6   0
00: DASD 9FC8 CP SYSTEM VM6LX5   2
00: DASD 9FC9 CP SYSTEM VM6LX2   0
00: DASD 9FCA CP SYSTEM VM6LX4   0
00: DASD 9FCB CP SYSTEM VM6LX3   0
00: DASD 9FCE CP SYSTEM VM6LX1   4
00: DASD 9FCF CP SYSTEM VM6PG2   0
00: DASD 9FD0 CP SYSTEM VM6PG1   0
00: DASD 9FD1 CP SYSTEM VM6SPL   0
00: DASD 9FD2 CP SYSTEM VM6RES   135

Re: Linux on Z/VM running WAS problems - anyone got any tips?

Reply via email to