On Tue, 02 Dec 2008 10:24:59 +0100, Pavel Filipensky <Pavel.Filipensky at sun.com> wrote:
> usefully attempt an all-zones umountall'. Fix for "6675447 NFSv4 client > hangs on shutdown if server is down beforehand"- has added the '-l' flag > (limit actions to the local file systems) to svc.startd: > system("/sbin/umountall -l" with this fix, we no longer unmount NFS there. a bit of topic in the context of this proposal, but still... fwiw, I have to disagree here, despite the assumption above, I can prove you that 'umountall -l' still does attempt to unmount NFS file systems. see: http://bugs.opensolaris.org/view_bug.do?bug_id=6544130 unfortunately my latest update from yesterday is not yet public. what is happending is that 'umountall -l' indeed does trigger a 'umount -a' for autofs triggered NFS mounts still in mnttab regardless above theory statement. re-doing my update to above bug here to show you the picture: <snip> so I was finally able to gather a crash dump of this event. see comments section for the path and attached threadlist. the picture when the network was down and a shutdown initated after the network has been taken down but before the idle autofs nfs mounts had been unmounted looked like this: > ::status debugging crash dump vmcore.0 (64-bit) from opteron operating system: 5.11 snv_102 (i86pc) ### autofs/nfs mounts: ffffff01ade36eb8 autofs /net ffffff01ade36de8 autofs /home ffffff01ade36428 autofs /net/xenbld.sfbay/sp1 ffffff01ade36358 autofs /net/xenbld.sfbay/dskpool ffffff01ade36288 autofs /net/xenbld.sfbay/export ffffff01ade361b8 autofs /net/xenbld.sfbay/export/build ffffff01ade360e8 autofs /net/xenbld.sfbay/export/xVM-Server ffffff01ade36698 nfs /net/xenbld.sfbay/export/xVM-Server ffffff01ade36018 autofs /net/xenbld.sfbay/export/xVM-Server/gates ffffff01cd891ec0 autofs /net/xenbld.sfbay/export/xVM-Server/builds ffffff01cd891df0 autofs /net/xenbld.sfbay/export/xVM-Server/gates-i14 ffffff01cd891d20 autofs /net/xenbld.sfbay/export/xVM-Server/builds-i14 ffffff01cd891c50 autofs /net/xenbld.sfbay/export/xVM-Server/builds-i13 ffffff01cd891b80 autofs /net/xenbld.sfbay/export/xVM-Server/gates-i12b7 ffffff01cd891ab0 autofs /net/xenbld.sfbay/export/xVM-Server/gates-i1... ffffff01cd8919e0 autofs /net/xenbld.sfbay/export/xVM-Server/producti... ffffff01cd891910 nfs /net/xenbld.sfbay/export/xVM-Server/builds ### same messages on the console about unkillable autofs services and ### nfs client mounts and: NOTICE: [NFS4][Server: xenbld.sfbay][Mntpt: /net/xenbld.sfbay/export/xVM-Server]NFS server xenbld.sfbay not responding; still trying ### process tree still active: > ::ptree fffffffffbc29430 sched ffffff01a99cba48 fsflush ffffff01a99cc6a8 pageout ffffff01a99cd308 init ffffff01ae948320 automountd ffffff01ae93c8e8 automountd ffffff01a99b4538 powerd ffffff01a99c9528 svc.configd ffffff01a99ca188 svc.startd ffffff01a99c4a50 sh ffffff01a99c56b0 umountall ffffff01a99b8318 umountall ffffff01a99c0c70 umount ffffff01ae9438e0 tail ### what's umountall doing ? > ffffff01a99b8318::ps -tf S PID PPID PGID SID UID FLAGS ADDR NAME R 3224 3212 7 7 0 0x42000000 ffffff01a99b8318 /sbin/sh /sbin/umountall -l T 0xffffff01aa1b4820 <TS_SLEEP> > 0xffffff01aa1b4820::findstack -v stack pointer for thread ffffff01aa1b4820: ffffff0007ea9c70 [ ffffff0007ea9c70 _resume_from_idle+0xf1() ] ffffff0007ea9ca0 swtch+0x160() ffffff0007ea9d00 cv_wait_sig_swap_core+0x170(ffffff01a99b83d8, fffffffffbcd65f8, 0) ffffff0007ea9d20 cv_wait_sig_swap+0x18(ffffff01a99b83d8, fffffffffbcd65f8) ffffff0007ea9dc0 waitid+0x2a0(0, c9a, ffffff0007ea9dd0, 83) ffffff0007ea9ec0 waitsys32+0x30(0, c9a, 8047560, 83) ffffff0007ea9f10 sys_syscall32+0x101() ### whats current active umount doing ? > ffffff01a99c0c70::ps -tf S PID PPID PGID SID UID FLAGS ADDR NAME R 3226 3224 7 7 0 0x4a004000 ffffff01a99c0c70 umount -a /net/xenbld.sfbay/export/xVM-Server/production-IPS-server /net/xenbld T 0xffffff01aa1aabe0 <TS_SLEEP> > 0xffffff01aa1aabe0::findstack -v stack pointer for thread ffffff01aa1aabe0: ffffff00076da0f0 [ ffffff00076da0f0 _resume_from_idle+0xf1() ] ffffff00076da120 swtch+0x160() ffffff00076da190 cv_timedwait_sig+0x1bd(ffffff00076da300, ffffffffc0342750, 3b50b0) ffffff00076da200 waitforack+0x99(ffffff00076da2d0, 10, ffffff00076da5e0, 0) ffffff00076da290 connmgr_connect+0xfd(ffffff01ad5b3d80, ffffff01adc0f660, ffffff01ae49eb68, 2, ffffff00076da2d0, ffffff01ad5b3da0, 1, ffffff00076da5e0, 0) ffffff00076da390 connmgr_wrapconnect+0x144(ffffff01ad5b3d80, ffffff00076da5e0, ffffff01ae49eb68, 2, ffffff01ae49eb50, ffffff01ae49eb40, 1, 0) ffffff00076da4e0 connmgr_get+0x351(0, ffffff00076da5e0, ffffff01ae49eb68, 2, ffffff01ae49eb50, ffffff01ae49eb40, 2a00000000, 0, ffffffff) ffffff00076da530 connmgr_wrapget+0x59(0, ffffff00076da5e0, ffffff01ae49eac0) ffffff00076da6a0 clnt_cots_kcallit+0x22e(ffffff01ae49eac0, 1, fffffffff8685cd8, ffffff00076da900, fffffffff8685f48, ffffff00076da920, 3c) ffffff00076da7c0 nfs4_rfscall+0x3f6(ffffff01bc336000, 1, fffffffff8685cd8, ffffff00076da900, fffffffff8685f48, ffffff00076da920, ffffff01a8ab6cc8, ffffff00076da8d4, ffffff00076da80c, 0, ffffff01aa75e040) ffffff00076da870 rfs4call+0xb6(ffffff01bc336000, ffffff00076da900, ffffff00076da920, ffffff01a8ab6cc8, ffffff00076da8d4, 0, ffffff00076da8f0) ffffff00076da990 nfs4lookupnew_otw+0x292(ffffff01c7800d40, ffffff00076dabe0, ffffff00076dabb0, ffffff01a8ab6cc8) ffffff00076daa00 nfs4lookup+0x20c(ffffff01c7800d40, ffffff00076dabe0, ffffff00076dabb0, ffffff01a8ab6cc8, 0) ffffff00076daa80 nfs4_lookup+0xe3(ffffff01c7800d40, ffffff00076dabe0, ffffff00076dabb0, ffffff00076dae50, 0, ffffff01a8dca840, ffffff01a8ab6cc8 , 0, 0, 0) ffffff00076dab20 fop_lookup+0xed(ffffff01c7800d40, ffffff00076dabe0, ffffff00076dabb0, ffffff00076dae50, 0, ffffff01a8dca840, ffffff01a8ab6cc8 , 0, 0, 0) ffffff00076dad60 lookuppnvp+0x3a3(ffffff00076dae50, ffffff00076dae70, 1, 0, 0, ffffff01a8dca840, ffffff01a8dca840, ffffff01a8ab6cc8) ffffff00076dae00 lookuppnat+0x12c(ffffff00076dae50, ffffff00076dae70, 1, 0, 0, 0) ffffff00076dae40 lookuppn+0x28(ffffff00076dae50, ffffff00076dae70, 1, 0, 0) ffffff00076daec0 resolvepath+0x65(8047cf2, 8065f98, 400) ffffff00076daf10 sys_syscall32+0x101() ### so we see that the umount(1M) command is attempting to resolve ### the path name for the object it should unmount. ### thats usr/src/cmd/fs.d/umount.c:realpath(3C)->resolvepath(2) ### this triggered an attempt to lookup the mount point: ffffff00076dae40 lookuppn+0x28(ffffff00076dae50, ffffff00076dae70, 1, 0, 0) 120 lookuppn( 121 struct pathname *pnp, 122 struct pathname *rpnp, 123 enum symfollow followlink, 124 vnode_t **dirvpp, 125 vnode_t **compvpp) ### looking at the path name we attempt to lookup confirms this > ffffff00076dae70/J 0xffffff00076dae70: ffffff01acaa5000 > ffffff01acaa5000/s 0xffffff01acaa5000: /net/xenbld.sfbay/export/xVM-Server ### to recap this was: ffffff01ade360e8 autofs /net/xenbld.sfbay/export/xVM-Server ffffff01ade36698 nfs /net/xenbld.sfbay/export/xVM-Server ### so eventually we'll end up doing a VOP_LOOKUP() OTW via nfs4_lookup() ### and that will be stuck forever as the network is already down. ### end of game, you gota powercycle the box or bring the network online again ### the corresponding mount info structs for V4 show: > ::nfs4_mntinfo -v +--------------------------------------+ mntinfo4_t: 0xffffff01bc336000 NFS Version: 4 mi_flags: MI4_HARD,MI4_PRINTED,MI4_INT,MI4_LINK,MI4_SYMLINK,MI4_ACL,MI4_INACTIVE_IDLE mi_error: 0 mi_open_files: 0 mi_msg_count: 1 mi_recovflags: mi_recovthread: 0x0 mi_in_recovery: 0 mount point: /net/xenbld.sfbay/export/xVM-Server mount from: xenbld.sfbay:/export/xVM-Server mi_zone=fffffffffbcd1900 mi_curread=1048576, mi_curwrite=1048576, mi_retrans=5, mi_timeo=600 mi_acregmin=3000000000, mi_acregmax=60000000000,mi_acdirmin=30000000000, mi_acdirmax=60000000000 Server list: ffffff01ae1cf4c0 Current Server: ffffff01ae1cf4c0 192.168.79.55:2049 Total: Server Non-responses=24; Server Failovers=0 IO statistics for this mount No. of bytes read 0 No. of read operations 0 No. of bytes written 0 No. of write operations 0 Async Request queue: max threads = 8 active threads = 0 number requests queued: READ_AHEAD = 0 PUTPAGE = 0 PAGEIO = 0 READDIR = 0 INACTIVE = 0 COMMIT = 0 ============================================= Messages queued: [NFS4]2008 Nov 30 21:38:35: Server xenbld.sfbay not responding, still trying ============================================= +--------------------------------------+ mntinfo4_t: 0xffffff01bc33d000 NFS Version: 4 mi_flags: MI4_HARD,MI4_INT,MI4_LINK,MI4_SYMLINK,MI4_ACL,MI4_POSIX_LOCK,MI4_INACTIVE_IDLE mi_error: 0 mi_open_files: 0 mi_msg_count: 0 mi_recovflags: mi_recovthread: 0x0 mi_in_recovery: 0 mount point: /net/xenbld.sfbay/export/xVM-Server/builds mount from: xenbld.sfbay:/export/xVM-Server/builds mi_zone=fffffffffbcd1900 mi_curread=1048576, mi_curwrite=1048576, mi_retrans=5, mi_timeo=600 mi_acregmin=3000000000, mi_acregmax=60000000000,mi_acdirmin=30000000000, mi_acdirmax=60000000000 Server list: ffffff01ad429100 Current Server: ffffff01ad429100 192.168.79.55:2049 Total: Server Non-responses=0; Server Failovers=0 IO statistics for this mount No. of bytes read 823785672 No. of read operations 25257 No. of bytes written 0 No. of write operations 0 Async Request queue: max threads = 8 active threads = 0 number requests queued: READ_AHEAD = 0 PUTPAGE = 0 PAGEIO = 0 READDIR = 0 INACTIVE = 0 COMMIT = 0 ============================================= Messages queued: ============================================= ### they are not yet unmounted, we know that, no MI4_DEAD or MI4_DOWN ### the corresponding vfs_t: > ffffff01ade36698::print vfs_t { vfs_next = 0xffffff01ade36018 vfs_prev = 0xffffff01ade360e8 vfs_op = vfssw+0xd38 vfs_vnodecovered = 0xffffff01c77d4240 vfs_flag = 0x2c08 VFS_STATS|VFS_NODEVICES|VFS_XATTR|VFS_NOSETUID - no VFS_UNMOUNTED bussiness summary we wait till end of days in that situation. <snip end> --- frankB