Hi Joe,

The stack traces for ldmd look OK but it occurred to me I wasn't specific with my request - could you run 'pstack `pgrep ldmd` on the source and target machines while ldm is hung (I'm not sure if you did)?

Does a 'svcadm restart ldmd' allow further ldm commands to work (and avoid the 
reboot)?

Regards,
Liam


On 22/04/2011 16:20, Manek, Joe A (SAIC) wrote:
Thanks for the help Liam.

Yes, the NFS mount where aktst1's disk images are is available on the
target machine.  The same vds server and vsw is available as well,
although I would hope the dryrun should at least let me know if it
wasn't.

Checked /var/svc/log/ldoms-ldmd:default.log, didn't see anything of any
note.  Tailed it during another migrate attempt, nary a new line of
output added.  The following is output on each reboot/restart.

[ Apr 22 07:18:13 Method "start" exited with status 0 ]
warning: fma_cpu_svc_get_p_status: FMA cpu operations to some domains
may not be not available
Power Management policy == performance mode.
warning: Autosave config 'aktst1_4cpu_3g' is newer than SP config

On the source:

# ldm ls-services
VCC
     NAME             LDOM             PORT-RANGE
     primary-vcc0     primary          5000-5100

VSW
     NAME             LDOM             MAC               NET-DEV   ID
DEVICE     LINKPROP   DEFAULT-VLAN-ID PVID VID                  MTU
MODE
     primary-vsw0     primary          00:14:4f:f8:1c:0f e1000g0   0
switch@0              1               1                         1500

VDS
     NAME             LDOM             VOLUME         OPTIONS
MPGROUP        DEVICE
     primary-vds0     primary          aktst1_disk0
/nfs_ldoms/aktst1/disk0
                                       aktst1_disk1
/nfs_ldoms/aktst1/disk1
     primary-swap     primary          aktst1_swap0
/nfs_ldoms/aktst1/swap0

On the target:

aku290# ldm ls
NAME             STATE      FLAGS   CONS    VCPU  MEMORY   UTIL  UPTIME
primary          active     -n-cv-  SP      8     8G       8.6%  162d 7h
aku261           active     -n----  5002    4     4G       0.4%  162d 7h
aku291           active     -n----  5000    16    8G        10%  64d 3h
3m
aku292           active     -n----  5001    16    8G        41%  64d 2h
31m
aku293           active     -n----  5003    8     12G      0.4%  162d 7h
aku290# ldm ls-services
VCC
     NAME             LDOM             PORT-RANGE
     primary-vcc0     primary          5000-5100

VSW
     NAME             LDOM             MAC               NET-DEV   DEVICE
DEFAULT-VLAN-ID PVID VID                  MTU   MODE
     primary-vsw0     primary          00:14:4f:fb:61:83 nxge0
switch@0   1               1                         1500
     primary-vsw1     primary          00:14:4f:fb:46:e5 nxge1
switch@1   1               1                         1500

VDS
     NAME             LDOM             VOLUME         OPTIONS
MPGROUP        DEVICE
     primary-vds0     primary          aku291_disk0
/ldoms/prod/aku291/disk0
                                       aku291_disk1
/ldoms/prod/aku291/disk1
                                       aku292_disk0
/ldoms/prod/aku292/disk0
                                       aku292_disk1
/ldoms/prod/aku292/disk1
                                       aku261-vol0
/ldoms/prod/aku261/disk0
                                       aku261-vol1
/ldoms/prod/aku261/disk1
                                       aku261-solarisdvd ro
/datasets/techsup/Solaris/10/sol-10-u7-ga-sparc-dvd-iso
                                       aku293_disk0
/ldoms/prod/aku293/disk0
                                       aku293_disk1
/ldoms/prod/aku293/disk1
                                       aku261-vol2
/ldoms/prod/aku261/disk2
     primary-swap     primary          aku291_swap0
/ldoms/prod/aku291/swap0
                                       aku292_swap0
/ldoms/prod/aku292/swap0
                                       aku293_swap0
/ldoms/prod/aku293/swap0


On the source:

# pstack `pgrep ldmd`
249:    /opt/SUNWldm/bin/ldmd
-----------------  lwp# 1 / thread# 1  --------------------
  feec9a58 lwp_park (0, 0, 0)
  feec3aa0 cond_wait_queue (e0598, e0580, 0, 0, 1c00, 0) + 4c
  feec3fe8 cond_wait (e0598, e0580, 0, 0, e0580, ff04b7b0) + 10
  00080f54 sequence (0, 32c, 0, c5b28, e05c4, 18) + 3c
  00064684 main     (830c0, e2400, 1, 0, e2400, bf8a8) + 488
  00021c08 _start   (0, 0, 0, 0, 0, 0) + 108
-----------------  lwp# 2 / thread# 2  --------------------
  feecd284 pollsys  (feb7bf98, 1, 0, 0)
  fee63adc poll     (feb7bf98, 1, ffffffff, 0, b49a4, 14eeb8) + 7c
  0004beac hvctl_poll (0, bc400, e22d8, e02a0, bc5c4, 14eeb8) + 44
  feec99b8 _lwp_start (0, 0, 0, 0, 0, 0)
-----------------  lwp# 3 / thread# 3  --------------------
  feecd018 ioctl    (5, 64737002, fea7bf98)
  0004c2d8 pri_poll (0, fea7c000, 0, 1, fea7bf98, fe9603a0) + f0
  feec99b8 _lwp_start (0, 0, 0, 0, 0, 0)
-----------------  lwp# 4 / thread# 4  --------------------
  feec9a58 lwp_park (0, 0, 0)
  feec3aa0 cond_wait_queue (e0420, e0408, 0, 0, 1c00, 0) + 4c
  feec3fe8 cond_wait (e0420, e0408, 0, 0, e0408, ff04b7b0) + 10
  0005ad6c msg_handler (0, 1, 14be48, e0408, b3870, 0) + 30
  feec99b8 _lwp_start (0, 0, 0, 0, 0, 0)
-----------------  lwp# 5 / thread# 5  --------------------
  feecc9d0 portfs   (5, 6, fe7fbf90, 0, 0, 0)
  0005c564 comm_io  (4, 1, fe7fbf90, e1684, 5bef8, e1400) + 18
  feec99b8 _lwp_start (0, 0, 0, 0, 0, 0)
-----------------  lwp# 6 / thread# 6  --------------------
  feecdc20 door     (fe6fa188, 4, 0, 0, fe6fbfa0, 4)
-----------------  lwp# 7 / thread# 7  --------------------
  feecdc20 door     (0, 0, 0, 0, fe5fbfa0, 4)
-----------------  lwp# 8 / thread# 8  --------------------
  feeccb1c recvfrom (c, 205d08, 138, 0, e17a8, fe3fbf9c)
  feec99b8 _lwp_start (0, 0, 0, 0, 0, 0)
-----------------  lwp# 9 / thread# 9  --------------------
  feec9a58 lwp_park (0, fdf7bf28, 0)
  feec3aa0 cond_wait_queue (ff076088, ff0760a8, fdf7bf28, 0, febd3a00, 0)
+ 4c
  feec3ee4 cond_wait_common (ff076088, ff0760a8, fdf7bf28, 0, 0, 0) + 294
  feec4078 _cond_timedwait (ff076088, ff0760a8, fdf7bf90, 0, ff06c000,
fdf7bf98) + 34
  ff049c58 umem_update_thread (4db1985e, 4db19854, 0, ff06c000, ff06f2e0,
4db1985e) + 278
  feec99b8 _lwp_start (0, 0, 0, 0, 0, 0)
-----------------  lwp# 10 / thread# 10  --------------------
  feecc9d0 portfs   (5, 11, fde7bf7c, 0, 0, 0)
  000b2460 xmpp_loop (4, b30f8, e2400, fde7bf70, 7f000001, fde7bf90) +
1e8
  feec99b8 _lwp_start (0, 0, 0, 0, 0, 0)
-----------------  lwp# 11 / thread# 11  --------------------
  feecc9d0 portfs   (5, 10, fdd7bf90, 0, 0, 0)
  00073290 migration_loop (0, c2f48, fdd7bf90, b3b81, 73254, c2c00) + 3c
  feec99b8 _lwp_start (0, 0, 0, 0, 0, 0)
-----------------  lwp# 12 / thread# 12  --------------------
  feecc9d0 portfs   (5, 14, fdc7bf4c, 0, 0, 0)
  0004d410 client_loop (4cdec, c, bc910, 4d460, fdc7bf6c, 16) + 1c8
  feec99b8 _lwp_start (0, 0, 0, 0, 0, 0)
-----------------  lwp# 13 / thread# 13  --------------------
  feecc8ac nanosleep (fdb7bf98, 0)
  00088da0 guest_util_poller (0, e0400, e0400, e1800, 3, e0620) + a8
  feec99b8 _lwp_start (0, 0, 0, 0, 0, 0)
-----------------  lwp# 14 / thread# 14  --------------------
  feecc8ac nanosleep (fda7bf10, fda7bf08)
  0002b910 pmi_poll_policy (1ca468, 0, b35d4, e143c, 1, b35ec) + 1f8
  feec99b8 _lwp_start (0, 0, 0, 0, 0, 0)
-----------------  lwp# 16 / thread# 16  --------------------
  feecc8ac nanosleep (fd97bf94, 0)
  0002ede4 pmi_snmp_get_policy_loop (e14a8, b4800, df524, 1, e14a8,
e148c) + 264
  feec99b8 _lwp_start (0, 0, 0, 0, 0, 0)
-----------------  lwp# 19 / thread# 19  --------------------
  feecd398 read     (17, 264008, 5)
  fe15d9fc sock_read (116f58, 264008, 5, fe22c348, fe15d9d8, 0) + 24
  fe15b8d0 BIO_read (feec2fe8, 264008, 5, 5, 116f58, c87f0) + cc
  fe2b9898 ssl3_read_n (204e08, 5, 5, 0, 0, 3) + 154
  fe2b9a14 ssl3_get_record (204e08, 204e08, f0, 4400, f1, 11b7b0) + e0
  fe2ba5bc ssl3_read_bytes (204e08, 17, 245708, 400, 0, 0) + 1fc
  fe2b9510 ssl3_read_internal (204e08, 245708, 400, 0, 0, b1f68) + 44
  000b2fac xmpp_reader (11b890, fd87c000, 400, 0, 245708, 4) + 40
  feec99b8 _lwp_start (0, 0, 0, 0, 0, 0)

On the target:

aku290# pstack `pgrep ldmd`
520:    /opt/SUNWldm/bin/ldmd
-----------------  lwp# 1 / thread# 1  --------------------
  fef48a08 lwp_park (0, 0, 0)
  fef42a1c cond_wait_queue (b2da0, b2d88, 0, 0, 1c00, 0) + 4c
  fef42f64 cond_wait (b2da0, b2d88, 0, 0, b2d88, ff04b480) + 10
  00065140 sequence (0, 18, 0, ffbffb58, b2d88, be518) + 40
  0004fcbc main     (c8800, b9c58, 0, c8800, 4, 3) + 454
  0001f4d0 _start   (0, 0, 0, 0, 0, 0) + 108
-----------------  lwp# 2 / thread# 2  --------------------
  fef4c24c pollsys  (feb7bf98, 1, 0, 0)
  feee2ba4 poll     (feb7bf98, 1, ffffffff, 0, b4400, 129c28) + 7c
  0002ca2c hvctl_poll (0, c8508, b2000, b5800, 1, 3) + 4c
  fef48968 _lwp_start (0, 0, 0, 0, 0, 0)
-----------------  lwp# 3 / thread# 3  --------------------
  fef4bfe0 ioctl    (5, 64737002, fea7bf98)
-----------------  lwp# 4 / thread# 4  --------------------
  fef48a08 lwp_park (0, fe8fbf28, 0)
  fef42a1c cond_wait_queue (ff073ec0, ff073ee0, fe8fbf28, 0, febd1200, 0)
+ 4c
  fef42e60 cond_wait_common (ff073ec0, ff073ee0, fe8fbf28, 0, 0, 0) + 294
  fef42ff4 _cond_timedwait (ff073ec0, ff073ee0, fe8fbf90, 0, ff06a000,
fe8fbf98) + 34
  ff049a9c umem_update_thread (4db19869, 4db1985f, 0, ff06a000, ff06d120,
4db19869) + 278
  fef48968 _lwp_start (0, 0, 0, 0, 0, 0)
-----------------  lwp# 5 / thread# 5  --------------------
  fef48a08 lwp_park (0, 0, 0)
  fef42a1c cond_wait_queue (b2238, b2220, 0, 0, 1c00, 0) + 4c
  fef42f64 cond_wait (b2238, b2220, 0, 0, b2220, ff04b480) + 10
  00039844 msg_handler (0, 1, 11fe48, b2220, 94d88, 0) + 30
  fef48968 _lwp_start (0, 0, 0, 0, 0, 0)
-----------------  lwp# 6 / thread# 6  --------------------
  fef4b980 portfs   (5, 6, fe6fbf90, 0, 0, 0)
  0003b05c comm_io  (4, 1, fe6fbf90, c7a04, 3a9e8, c7800) + 18
  fef48968 _lwp_start (0, 0, 0, 0, 0, 0)
-----------------  lwp# 7 / thread# 7  --------------------
  fef4cc18 door     (fe5fbad0, 4, 0, 0, fe5fbfa0, 4)
-----------------  lwp# 8 / thread# 8  --------------------
  fef4cc18 door     (0, 0, 0, 0, fe4fbfa0, 4)
-----------------  lwp# 9 / thread# 9  --------------------
  fef4bacc recvfrom (d, 2bfd08, 138, 0, c7c48, fe2fbf9c)
-----------------  lwp# 10 / thread# 10  --------------------
  fef4b980 portfs   (5, 11, fdefbf7c, 0, 0, 0)
  00093cd8 xmpp_loop (4, 94988, c7400, c879c, c7530, 12) + 1dc
  fef48968 _lwp_start (0, 0, 0, 0, 0, 0)
-----------------  lwp# 11 / thread# 11  --------------------
  fef4b980 portfs   (5, 10, fddfbf90, 0, 0, 0)
-----------------  lwp# 12 / thread# 12  --------------------
  fef4b980 portfs   (5, 14, fdcfbf4c, 0, 0, 0)
  0002dec4 client_loop (2d8ac, 2d800, b5c00, b5cf4, 15, c79d0) + 1c0
  fef48968 _lwp_start (0, 0, 0, 0, 0, 0)
-----------------  lwp# 13 / thread# 13  --------------------
  fef4b85c nanosleep (fdbfbf98, 0)
  0006c13c guest_util_poller (3, fdbfc000, b2c00, c7c00, c7c70, 3) + a4
  fef48968 _lwp_start (0, 0, 0, 0, 0, 0)
-----------------  lwp# 15 / thread# 15  --------------------
  fef4b85c nanosleep (fdafbf28, fdafbf20)
  00025df0 pmi_poll_policy (2ce448, c8400, 0, b1ff8, c7934, fdafbf98) +
11c
  fef48968 _lwp_start (0, 0, 0, 0, 0, 0)
-----------------  lwp# 18 / thread# 18  --------------------
  fef4c360 read     (17, 3d4008, 5)

-----Original Message-----
From: Liam Merwick [mailto:[email protected]]
Sent: Friday, April 22, 2011 12:48 AM
To: Manek, Joe A (SAIC)
Cc: [email protected]
Subject: Re: [ldoms-discuss] Ldm v 2.0 "ldm migrate -n " that hangs on a
T5120

Hi Joe,

On 21/04/2011 23:34, Manek, Joe A (SAIC) wrote:
Hello all,

When attempting a dryrun migration "ldm migrate -n aktst1
root@akldom2:aktst1", the command never
completes. There are ample resources on the target LDOM but there
never appears to be any attempt to
contact the target LDOM. The command hangs, never completes. 'prstat'
or 'top' reveal no cpu being
used for anything other than normal stay-alive kind of OS processes.

In addition to the hung migration command, all subsequent 'ldm'
commands of any type also hang.

If the original 'ldm migrate' command process is killed the terminal
session is freed but no further
'ldm' commands of any type will complete. A reboot fixes the hangs on
various 'ldm' commands, but
any 'ldm migrate' command will again hang and lock-up all 'ldm'
commands. I've tried this with the
guest LDOM bound, unbound, running, not-running, always the same
one-and-done hang.

The Control-LDOM is running the proper recommended release ldm v 2.0
and a very current Recommended
patch level "SunOS aku100 5.10 Generic_144488-09 sun4v sparc
SUNW,SPARC-Enterprise-T5120" The T5120
is running the latest reccommended Firmware Release. The system has
been hard-power cycled since the
various FW/OS/Patching efforts.

Any input/experiences would be appreciated.


That isn't a symptom I recognise. A good starting place would be to
double-check that the disk
images are accessible on both the source and target machines. Also have
a look in the SMF log for
ldmd for any warnings (/var/svc/log/ldoms-ldmd:default.log).

What does 'pstack `pgrep ldmd` show on the source and target machines?

Regards,
Liam



Thanks,

Joe.


# prtdiag -v
<snip>
============================ FW Version ============================
Version
------------------------------------------------------------
Sun System Firmware 7.3.0.c 2011/01/04 19:00
<snip>


# ldm -V

Logical Domain Manager (v 2.0)
Hypervisor control protocol v 1.6
Using Hypervisor MD v 1.3

System PROM:
Hypervisor v. 1.9.2. @(#)Hypervisor 1.9.2.e 2011/01/04 17:24\015

OpenBoot v. 4.32.2. @(#)OpenBoot 4.32.2.b 2010/12/21 20:20

# cat /etc/release
Oracle Solaris 10 9/10 s10s_u9wos_14a SPARC
Copyright (c) 2010, Oracle and/or its affiliates. All rights reserved.
Assembled 11 August 2010

# ldm ls -l
NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME
primary active -n-cv- SP 8 3G 0.1% 1h 4m

SOFTSTATE
Solaris running

UUID
3637e14e-67d3-4686-a1d1-d6dd7334aa90

MAC
00:21:28:3f:98:2e

HOSTID
0x853f982e

CONTROL
failure-policy=ignore

DEPENDENCY
master=

CORE
CID CPUSET
0 (0, 1, 2, 3, 4, 5, 6, 7)

VCPU
VID PID CID UTIL STRAND
0 0 0 0.3% 100%
1 1 0 0.1% 100%
2 2 0 0.2% 100%
3 3 0 0.0% 100%
4 4 0 0.3% 100%
5 5 0 0.1% 100%
6 6 0 0.1% 100%
7 7 0 0.1% 100%

MEMORY
RA PA SIZE
0x8000000 0x8000000 3G

VARIABLES
auto-boot?=false
boot-device=disk:d
keyboard-layout=US-English
screen-#rows=24

IO
DEVICE PSEUDONYM OPTIONS
pci@0 pci
niu@80 niu
pci@0/pci@0/pci@8/pci@0/pci@9 MB/RISER0/PCIE0
pci@0/pci@0/pci@8/pci@0/pci@1 MB/RISER1/PCIE1
pci@0/pci@0/pci@9 MB/RISER2/PCIE2
pci@0/pci@0/pci@1/pci@0/pci@2 MB/NET0
pci@0/pci@0/pci@1/pci@0/pci@3 MB/NET2
pci@0/pci@0/pci@2 MB/SASHBA

VCC
NAME PORT-RANGE
primary-vcc0 5000-5100

VSW
NAME MAC NET-DEV ID DEVICE LINKPROP DEFAULT-VLAN-ID PVID VID MTU MODE

primary-vsw0 00:14:4f:f8:1c:0f e1000g0 0 switch@0 1 1 1500

VDS
NAME VOLUME OPTIONS MPGROUP DEVICE
primary-vds0 aktst1_disk0 /nfs_ldoms/aktst1/disk0
aktst1_disk1 /nfs_ldoms/aktst1/disk1
primary-swap aktst1_swap0 /nfs_ldoms/aktst1/swap0

VCONS
NAME SERVICE PORT
SP


------------------------------------------------------------------------
------
NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME
aktst1 bound ------ 5000 4 3G

UUID
f54f9f25-6fcc-4499-e1d8-ede5ec2d8f7b

MAC
00:14:4f:fa:8e:15

HOSTID
0x84fa8e15

CONTROL
failure-policy=ignore

DEPENDENCY
master=

CORE
CID CPUSET
1 (8, 9, 10, 11)

VCPU
VID PID CID UTIL STRAND
0 8 1 100%
1 9 1 100%
2 10 1 100%
3 11 1 100%

MEMORY
RA PA SIZE
0x8000000 0xc8000000 3G

VARIABLES
auto-boot?=false

NETWORK
NAME SERVICE ID DEVICE MAC MODE PVID VID MTU LINKPROP

vnet1 primary-vsw0@primary 0 network@0 00:14:4f:f9:6e:bd 1 1500

DISK
NAME VOLUME TOUT ID DEVICE SERVER MPGROUP
disk0 aktst1_disk0@primary-vds0 0 disk@0 primary
disk1 aktst1_disk1@primary-vds0 1 disk@1 primary
swap0 aktst1_swap0@primary-swap 2 disk@2 primary

VCONS
NAME SERVICE PORT
aktst1 primary-vcc0@primary 5000





_______________________________________________
ldoms-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/ldoms-discuss


_______________________________________________
ldoms-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/ldoms-discuss

Reply via email to