On Tue, 02 Dec 2008 10:24:59 +0100, Pavel Filipensky <Pavel.Filipensky at
sun.com> wrote:
> usefully attempt an all-zones umountall'. Fix for "6675447 NFSv4 client
> hangs on shutdown if server is down beforehand"- has added the '-l' flag
> (limit actions to the local file systems) to svc.startd:
> system("/sbin/umountall -l" with this fix, we no longer unmount NFS there.
a bit of topic in the context of this proposal, but still...
fwiw, I have to disagree here, despite the assumption above, I can
prove you that 'umountall -l' still does attempt to unmount NFS file systems.
see: http://bugs.opensolaris.org/view_bug.do?bug_id=6544130
unfortunately my latest update from yesterday is not yet public.
what is happending is that 'umountall -l' indeed does trigger a 'umount -a'
for autofs triggered NFS mounts still in mnttab regardless above theory
statement.
re-doing my update to above bug here to show you the picture:
<snip>
so I was finally able to gather a crash dump of this event. see comments section
for the path and attached threadlist.
the picture when the network was down and a shutdown initated after the network
has been taken down but before the idle autofs nfs mounts had been unmounted
looked like this:
> ::status
debugging crash dump vmcore.0 (64-bit) from opteron
operating system: 5.11 snv_102 (i86pc)
### autofs/nfs mounts:
ffffff01ade36eb8 autofs /net
ffffff01ade36de8 autofs /home
ffffff01ade36428 autofs /net/xenbld.sfbay/sp1
ffffff01ade36358 autofs /net/xenbld.sfbay/dskpool
ffffff01ade36288 autofs /net/xenbld.sfbay/export
ffffff01ade361b8 autofs /net/xenbld.sfbay/export/build
ffffff01ade360e8 autofs /net/xenbld.sfbay/export/xVM-Server
ffffff01ade36698 nfs /net/xenbld.sfbay/export/xVM-Server
ffffff01ade36018 autofs /net/xenbld.sfbay/export/xVM-Server/gates
ffffff01cd891ec0 autofs /net/xenbld.sfbay/export/xVM-Server/builds
ffffff01cd891df0 autofs /net/xenbld.sfbay/export/xVM-Server/gates-i14
ffffff01cd891d20 autofs /net/xenbld.sfbay/export/xVM-Server/builds-i14
ffffff01cd891c50 autofs /net/xenbld.sfbay/export/xVM-Server/builds-i13
ffffff01cd891b80 autofs /net/xenbld.sfbay/export/xVM-Server/gates-i12b7
ffffff01cd891ab0 autofs /net/xenbld.sfbay/export/xVM-Server/gates-i1...
ffffff01cd8919e0 autofs /net/xenbld.sfbay/export/xVM-Server/producti...
ffffff01cd891910 nfs /net/xenbld.sfbay/export/xVM-Server/builds
### same messages on the console about unkillable autofs services and
### nfs client mounts and:
NOTICE: [NFS4][Server: xenbld.sfbay][Mntpt:
/net/xenbld.sfbay/export/xVM-Server]NFS server xenbld.sfbay not responding;
still trying
### process tree still active:
> ::ptree
fffffffffbc29430 sched
ffffff01a99cba48 fsflush
ffffff01a99cc6a8 pageout
ffffff01a99cd308 init
ffffff01ae948320 automountd
ffffff01ae93c8e8 automountd
ffffff01a99b4538 powerd
ffffff01a99c9528 svc.configd
ffffff01a99ca188 svc.startd
ffffff01a99c4a50 sh
ffffff01a99c56b0 umountall
ffffff01a99b8318 umountall
ffffff01a99c0c70 umount
ffffff01ae9438e0 tail
### what's umountall doing ?
> ffffff01a99b8318::ps -tf
S PID PPID PGID SID UID FLAGS ADDR NAME
R 3224 3212 7 7 0 0x42000000 ffffff01a99b8318 /sbin/sh
/sbin/umountall -l
T 0xffffff01aa1b4820 <TS_SLEEP>
> 0xffffff01aa1b4820::findstack -v
stack pointer for thread ffffff01aa1b4820: ffffff0007ea9c70
[ ffffff0007ea9c70 _resume_from_idle+0xf1() ]
ffffff0007ea9ca0 swtch+0x160()
ffffff0007ea9d00 cv_wait_sig_swap_core+0x170(ffffff01a99b83d8,
fffffffffbcd65f8, 0)
ffffff0007ea9d20 cv_wait_sig_swap+0x18(ffffff01a99b83d8, fffffffffbcd65f8)
ffffff0007ea9dc0 waitid+0x2a0(0, c9a, ffffff0007ea9dd0, 83)
ffffff0007ea9ec0 waitsys32+0x30(0, c9a, 8047560, 83)
ffffff0007ea9f10 sys_syscall32+0x101()
### whats current active umount doing ?
> ffffff01a99c0c70::ps -tf
S PID PPID PGID SID UID FLAGS ADDR NAME
R 3226 3224 7 7 0 0x4a004000 ffffff01a99c0c70 umount -a
/net/xenbld.sfbay/export/xVM-Server/production-IPS-server /net/xenbld
T 0xffffff01aa1aabe0 <TS_SLEEP>
> 0xffffff01aa1aabe0::findstack -v
stack pointer for thread ffffff01aa1aabe0: ffffff00076da0f0
[ ffffff00076da0f0 _resume_from_idle+0xf1() ]
ffffff00076da120 swtch+0x160()
ffffff00076da190 cv_timedwait_sig+0x1bd(ffffff00076da300, ffffffffc0342750,
3b50b0)
ffffff00076da200 waitforack+0x99(ffffff00076da2d0, 10, ffffff00076da5e0, 0)
ffffff00076da290 connmgr_connect+0xfd(ffffff01ad5b3d80, ffffff01adc0f660,
ffffff01ae49eb68, 2, ffffff00076da2d0, ffffff01ad5b3da0, 1,
ffffff00076da5e0, 0)
ffffff00076da390 connmgr_wrapconnect+0x144(ffffff01ad5b3d80,
ffffff00076da5e0, ffffff01ae49eb68, 2, ffffff01ae49eb50, ffffff01ae49eb40, 1, 0)
ffffff00076da4e0 connmgr_get+0x351(0, ffffff00076da5e0, ffffff01ae49eb68, 2,
ffffff01ae49eb50, ffffff01ae49eb40, 2a00000000, 0, ffffffff)
ffffff00076da530 connmgr_wrapget+0x59(0, ffffff00076da5e0, ffffff01ae49eac0)
ffffff00076da6a0 clnt_cots_kcallit+0x22e(ffffff01ae49eac0, 1,
fffffffff8685cd8, ffffff00076da900, fffffffff8685f48, ffffff00076da920, 3c)
ffffff00076da7c0 nfs4_rfscall+0x3f6(ffffff01bc336000, 1, fffffffff8685cd8,
ffffff00076da900, fffffffff8685f48, ffffff00076da920,
ffffff01a8ab6cc8, ffffff00076da8d4, ffffff00076da80c, 0, ffffff01aa75e040)
ffffff00076da870 rfs4call+0xb6(ffffff01bc336000, ffffff00076da900,
ffffff00076da920, ffffff01a8ab6cc8, ffffff00076da8d4, 0, ffffff00076da8f0)
ffffff00076da990 nfs4lookupnew_otw+0x292(ffffff01c7800d40, ffffff00076dabe0,
ffffff00076dabb0, ffffff01a8ab6cc8)
ffffff00076daa00 nfs4lookup+0x20c(ffffff01c7800d40, ffffff00076dabe0,
ffffff00076dabb0, ffffff01a8ab6cc8, 0)
ffffff00076daa80 nfs4_lookup+0xe3(ffffff01c7800d40, ffffff00076dabe0,
ffffff00076dabb0, ffffff00076dae50, 0, ffffff01a8dca840, ffffff01a8ab6cc8
, 0, 0, 0)
ffffff00076dab20 fop_lookup+0xed(ffffff01c7800d40, ffffff00076dabe0,
ffffff00076dabb0, ffffff00076dae50, 0, ffffff01a8dca840, ffffff01a8ab6cc8
, 0, 0, 0)
ffffff00076dad60 lookuppnvp+0x3a3(ffffff00076dae50, ffffff00076dae70, 1, 0,
0, ffffff01a8dca840, ffffff01a8dca840, ffffff01a8ab6cc8)
ffffff00076dae00 lookuppnat+0x12c(ffffff00076dae50, ffffff00076dae70, 1, 0,
0, 0)
ffffff00076dae40 lookuppn+0x28(ffffff00076dae50, ffffff00076dae70, 1, 0, 0)
ffffff00076daec0 resolvepath+0x65(8047cf2, 8065f98, 400)
ffffff00076daf10 sys_syscall32+0x101()
### so we see that the umount(1M) command is attempting to resolve
### the path name for the object it should unmount.
### thats usr/src/cmd/fs.d/umount.c:realpath(3C)->resolvepath(2)
### this triggered an attempt to lookup the mount point:
ffffff00076dae40 lookuppn+0x28(ffffff00076dae50, ffffff00076dae70, 1, 0, 0)
120 lookuppn(
121 struct pathname *pnp,
122 struct pathname *rpnp,
123 enum symfollow followlink,
124 vnode_t **dirvpp,
125 vnode_t **compvpp)
### looking at the path name we attempt to lookup confirms this
> ffffff00076dae70/J
0xffffff00076dae70: ffffff01acaa5000
> ffffff01acaa5000/s
0xffffff01acaa5000: /net/xenbld.sfbay/export/xVM-Server
### to recap this was:
ffffff01ade360e8 autofs /net/xenbld.sfbay/export/xVM-Server
ffffff01ade36698 nfs /net/xenbld.sfbay/export/xVM-Server
### so eventually we'll end up doing a VOP_LOOKUP() OTW via nfs4_lookup()
### and that will be stuck forever as the network is already down.
### end of game, you gota powercycle the box or bring the network online again
### the corresponding mount info structs for V4 show:
> ::nfs4_mntinfo -v
+--------------------------------------+
mntinfo4_t: 0xffffff01bc336000
NFS Version: 4
mi_flags:
MI4_HARD,MI4_PRINTED,MI4_INT,MI4_LINK,MI4_SYMLINK,MI4_ACL,MI4_INACTIVE_IDLE
mi_error: 0
mi_open_files: 0
mi_msg_count: 1
mi_recovflags:
mi_recovthread: 0x0
mi_in_recovery: 0
mount point: /net/xenbld.sfbay/export/xVM-Server
mount from: xenbld.sfbay:/export/xVM-Server
mi_zone=fffffffffbcd1900
mi_curread=1048576, mi_curwrite=1048576, mi_retrans=5, mi_timeo=600
mi_acregmin=3000000000, mi_acregmax=60000000000,mi_acdirmin=30000000000,
mi_acdirmax=60000000000
Server list: ffffff01ae1cf4c0
Current Server: ffffff01ae1cf4c0 192.168.79.55:2049
Total: Server Non-responses=24; Server Failovers=0
IO statistics for this mount
No. of bytes read 0
No. of read operations 0
No. of bytes written 0
No. of write operations 0
Async Request queue:
max threads = 8 active threads = 0
number requests queued:
READ_AHEAD = 0 PUTPAGE = 0 PAGEIO = 0 READDIR = 0 INACTIVE = 0
COMMIT = 0
=============================================
Messages queued:
[NFS4]2008 Nov 30 21:38:35: Server xenbld.sfbay not responding, still trying
=============================================
+--------------------------------------+
mntinfo4_t: 0xffffff01bc33d000
NFS Version: 4
mi_flags:
MI4_HARD,MI4_INT,MI4_LINK,MI4_SYMLINK,MI4_ACL,MI4_POSIX_LOCK,MI4_INACTIVE_IDLE
mi_error: 0
mi_open_files: 0
mi_msg_count: 0
mi_recovflags:
mi_recovthread: 0x0
mi_in_recovery: 0
mount point: /net/xenbld.sfbay/export/xVM-Server/builds
mount from: xenbld.sfbay:/export/xVM-Server/builds
mi_zone=fffffffffbcd1900
mi_curread=1048576, mi_curwrite=1048576, mi_retrans=5, mi_timeo=600
mi_acregmin=3000000000, mi_acregmax=60000000000,mi_acdirmin=30000000000,
mi_acdirmax=60000000000
Server list: ffffff01ad429100
Current Server: ffffff01ad429100 192.168.79.55:2049
Total: Server Non-responses=0; Server Failovers=0
IO statistics for this mount
No. of bytes read 823785672
No. of read operations 25257
No. of bytes written 0
No. of write operations 0
Async Request queue:
max threads = 8 active threads = 0
number requests queued:
READ_AHEAD = 0 PUTPAGE = 0 PAGEIO = 0 READDIR = 0 INACTIVE = 0
COMMIT = 0
=============================================
Messages queued:
=============================================
### they are not yet unmounted, we know that, no MI4_DEAD or MI4_DOWN
### the corresponding vfs_t:
> ffffff01ade36698::print vfs_t
{
vfs_next = 0xffffff01ade36018
vfs_prev = 0xffffff01ade360e8
vfs_op = vfssw+0xd38
vfs_vnodecovered = 0xffffff01c77d4240
vfs_flag = 0x2c08
VFS_STATS|VFS_NODEVICES|VFS_XATTR|VFS_NOSETUID - no VFS_UNMOUNTED
bussiness summary we wait till end of days in that situation.
<snip end>
---
frankB