On Tue, 02 Dec 2008 10:24:59 +0100, Pavel Filipensky <Pavel.Filipensky at 
sun.com> wrote:

> usefully attempt an all-zones umountall'. Fix for "6675447  NFSv4 client
> hangs on shutdown if server is down beforehand"-  has added the '-l' flag
> (limit actions to the local file systems) to svc.startd:
>    system("/sbin/umountall -l" with this fix, we no longer unmount NFS there.

a bit of topic in the context of this proposal, but still...
fwiw, I have to disagree here, despite the assumption above, I can
prove you that 'umountall -l' still does attempt to unmount NFS file systems.

see: http://bugs.opensolaris.org/view_bug.do?bug_id=6544130

unfortunately my latest update from yesterday is not yet public.
what is happending is that 'umountall -l' indeed does trigger a 'umount -a'
for autofs triggered NFS mounts still in mnttab regardless above theory 
statement.

re-doing my update to above bug here to show you the picture:

<snip>
so I was finally able to gather a crash dump of this event. see comments section
for the path and attached threadlist.

the picture when the network was down and a shutdown initated after the network
has been taken down but before the idle autofs nfs mounts had been unmounted 
looked like this:

> ::status
debugging crash dump vmcore.0 (64-bit) from opteron
operating system: 5.11 snv_102 (i86pc)

### autofs/nfs mounts:

ffffff01ade36eb8 autofs          /net
ffffff01ade36de8 autofs          /home
ffffff01ade36428 autofs          /net/xenbld.sfbay/sp1
ffffff01ade36358 autofs          /net/xenbld.sfbay/dskpool
ffffff01ade36288 autofs          /net/xenbld.sfbay/export
ffffff01ade361b8 autofs          /net/xenbld.sfbay/export/build
ffffff01ade360e8 autofs          /net/xenbld.sfbay/export/xVM-Server
ffffff01ade36698 nfs             /net/xenbld.sfbay/export/xVM-Server
ffffff01ade36018 autofs          /net/xenbld.sfbay/export/xVM-Server/gates
ffffff01cd891ec0 autofs          /net/xenbld.sfbay/export/xVM-Server/builds
ffffff01cd891df0 autofs          /net/xenbld.sfbay/export/xVM-Server/gates-i14
ffffff01cd891d20 autofs          /net/xenbld.sfbay/export/xVM-Server/builds-i14
ffffff01cd891c50 autofs          /net/xenbld.sfbay/export/xVM-Server/builds-i13
ffffff01cd891b80 autofs          /net/xenbld.sfbay/export/xVM-Server/gates-i12b7
ffffff01cd891ab0 autofs          /net/xenbld.sfbay/export/xVM-Server/gates-i1...
ffffff01cd8919e0 autofs          /net/xenbld.sfbay/export/xVM-Server/producti...
ffffff01cd891910 nfs             /net/xenbld.sfbay/export/xVM-Server/builds

### same messages on the console about unkillable autofs services and
### nfs client mounts and:

NOTICE: [NFS4][Server: xenbld.sfbay][Mntpt: 
/net/xenbld.sfbay/export/xVM-Server]NFS server xenbld.sfbay not responding; 
still trying

### process tree still active:

> ::ptree
fffffffffbc29430  sched
     ffffff01a99cba48  fsflush
     ffffff01a99cc6a8  pageout
     ffffff01a99cd308  init
          ffffff01ae948320  automountd
               ffffff01ae93c8e8  automountd
          ffffff01a99b4538  powerd
          ffffff01a99c9528  svc.configd
          ffffff01a99ca188  svc.startd
               ffffff01a99c4a50  sh
                    ffffff01a99c56b0  umountall
                         ffffff01a99b8318  umountall
                              ffffff01a99c0c70  umount
                              ffffff01ae9438e0  tail

### what's umountall doing ?

> ffffff01a99b8318::ps -tf
S    PID   PPID   PGID    SID    UID      FLAGS             ADDR NAME
R   3224   3212      7      7      0 0x42000000 ffffff01a99b8318 /sbin/sh 
/sbin/umountall -l
        T  0xffffff01aa1b4820 <TS_SLEEP>

> 0xffffff01aa1b4820::findstack -v
stack pointer for thread ffffff01aa1b4820: ffffff0007ea9c70
[ ffffff0007ea9c70 _resume_from_idle+0xf1() ]
  ffffff0007ea9ca0 swtch+0x160()
  ffffff0007ea9d00 cv_wait_sig_swap_core+0x170(ffffff01a99b83d8, 
fffffffffbcd65f8, 0)
  ffffff0007ea9d20 cv_wait_sig_swap+0x18(ffffff01a99b83d8, fffffffffbcd65f8)
  ffffff0007ea9dc0 waitid+0x2a0(0, c9a, ffffff0007ea9dd0, 83)
  ffffff0007ea9ec0 waitsys32+0x30(0, c9a, 8047560, 83)
  ffffff0007ea9f10 sys_syscall32+0x101()

### whats current active umount doing ?

> ffffff01a99c0c70::ps -tf
S    PID   PPID   PGID    SID    UID      FLAGS             ADDR NAME
R   3226   3224      7      7      0 0x4a004000 ffffff01a99c0c70 umount -a 
/net/xenbld.sfbay/export/xVM-Server/production-IPS-server /net/xenbld
        T  0xffffff01aa1aabe0 <TS_SLEEP>

> 0xffffff01aa1aabe0::findstack -v
stack pointer for thread ffffff01aa1aabe0: ffffff00076da0f0
[ ffffff00076da0f0 _resume_from_idle+0xf1() ]
  ffffff00076da120 swtch+0x160()
  ffffff00076da190 cv_timedwait_sig+0x1bd(ffffff00076da300, ffffffffc0342750, 
3b50b0)
  ffffff00076da200 waitforack+0x99(ffffff00076da2d0, 10, ffffff00076da5e0, 0)
  ffffff00076da290 connmgr_connect+0xfd(ffffff01ad5b3d80, ffffff01adc0f660, 
ffffff01ae49eb68, 2, ffffff00076da2d0, ffffff01ad5b3da0, 1,
  ffffff00076da5e0, 0)
  ffffff00076da390 connmgr_wrapconnect+0x144(ffffff01ad5b3d80, 
ffffff00076da5e0, ffffff01ae49eb68, 2, ffffff01ae49eb50, ffffff01ae49eb40, 1, 0)
  ffffff00076da4e0 connmgr_get+0x351(0, ffffff00076da5e0, ffffff01ae49eb68, 2, 
ffffff01ae49eb50, ffffff01ae49eb40, 2a00000000, 0, ffffffff)
  ffffff00076da530 connmgr_wrapget+0x59(0, ffffff00076da5e0, ffffff01ae49eac0)
  ffffff00076da6a0 clnt_cots_kcallit+0x22e(ffffff01ae49eac0, 1, 
fffffffff8685cd8, ffffff00076da900, fffffffff8685f48, ffffff00076da920, 3c)
  ffffff00076da7c0 nfs4_rfscall+0x3f6(ffffff01bc336000, 1, fffffffff8685cd8, 
ffffff00076da900, fffffffff8685f48, ffffff00076da920,
  ffffff01a8ab6cc8, ffffff00076da8d4, ffffff00076da80c, 0, ffffff01aa75e040)
  ffffff00076da870 rfs4call+0xb6(ffffff01bc336000, ffffff00076da900, 
ffffff00076da920, ffffff01a8ab6cc8, ffffff00076da8d4, 0, ffffff00076da8f0)
  ffffff00076da990 nfs4lookupnew_otw+0x292(ffffff01c7800d40, ffffff00076dabe0, 
ffffff00076dabb0, ffffff01a8ab6cc8)
  ffffff00076daa00 nfs4lookup+0x20c(ffffff01c7800d40, ffffff00076dabe0, 
ffffff00076dabb0, ffffff01a8ab6cc8, 0)
  ffffff00076daa80 nfs4_lookup+0xe3(ffffff01c7800d40, ffffff00076dabe0, 
ffffff00076dabb0, ffffff00076dae50, 0, ffffff01a8dca840, ffffff01a8ab6cc8
  , 0, 0, 0)
  ffffff00076dab20 fop_lookup+0xed(ffffff01c7800d40, ffffff00076dabe0, 
ffffff00076dabb0, ffffff00076dae50, 0, ffffff01a8dca840, ffffff01a8ab6cc8
  , 0, 0, 0)
  ffffff00076dad60 lookuppnvp+0x3a3(ffffff00076dae50, ffffff00076dae70, 1, 0, 
0, ffffff01a8dca840, ffffff01a8dca840, ffffff01a8ab6cc8)
  ffffff00076dae00 lookuppnat+0x12c(ffffff00076dae50, ffffff00076dae70, 1, 0, 
0, 0)
  ffffff00076dae40 lookuppn+0x28(ffffff00076dae50, ffffff00076dae70, 1, 0, 0)
  ffffff00076daec0 resolvepath+0x65(8047cf2, 8065f98, 400)
  ffffff00076daf10 sys_syscall32+0x101()

### so we see that the umount(1M) command is attempting to resolve
### the path name for the object it should unmount.
### thats usr/src/cmd/fs.d/umount.c:realpath(3C)->resolvepath(2)
### this triggered an attempt to lookup the mount point:

ffffff00076dae40 lookuppn+0x28(ffffff00076dae50, ffffff00076dae70, 1, 0, 0)

 120 lookuppn( 
 121         struct pathname *pnp, 
 122         struct pathname *rpnp, 
 123         enum symfollow followlink, 
 124         vnode_t **dirvpp, 
 125         vnode_t **compvpp) 

### looking at the path name we attempt to lookup confirms this

> ffffff00076dae70/J
0xffffff00076dae70:             ffffff01acaa5000
> ffffff01acaa5000/s
0xffffff01acaa5000:             /net/xenbld.sfbay/export/xVM-Server

### to recap this was:

ffffff01ade360e8 autofs          /net/xenbld.sfbay/export/xVM-Server
ffffff01ade36698 nfs             /net/xenbld.sfbay/export/xVM-Server

### so eventually we'll end up doing a VOP_LOOKUP() OTW via nfs4_lookup()
### and that will be stuck forever as the network is already down.
### end of game, you gota powercycle the box or bring the network online again

### the corresponding mount info structs for V4 show:

> ::nfs4_mntinfo -v
+--------------------------------------+
    mntinfo4_t: 0xffffff01bc336000
   NFS Version: 4
      mi_flags: 
MI4_HARD,MI4_PRINTED,MI4_INT,MI4_LINK,MI4_SYMLINK,MI4_ACL,MI4_INACTIVE_IDLE
      mi_error: 0
 mi_open_files: 0
  mi_msg_count: 1
 mi_recovflags:
mi_recovthread: 0x0
mi_in_recovery: 0
   mount point: /net/xenbld.sfbay/export/xVM-Server
    mount from: xenbld.sfbay:/export/xVM-Server
mi_zone=fffffffffbcd1900
mi_curread=1048576, mi_curwrite=1048576, mi_retrans=5, mi_timeo=600
mi_acregmin=3000000000, mi_acregmax=60000000000,mi_acdirmin=30000000000, 
mi_acdirmax=60000000000
 Server list: ffffff01ae1cf4c0

 Current Server: ffffff01ae1cf4c0 192.168.79.55:2049

  Total: Server Non-responses=24; Server Failovers=0
IO statistics for this mount
        No. of bytes read               0
        No. of read operations          0
        No. of bytes written            0
        No. of write operations         0
 Async Request queue:
     max threads = 8 active threads = 0
     number requests queued:
     READ_AHEAD = 0    PUTPAGE = 0     PAGEIO = 0    READDIR = 0   INACTIVE = 0 
    COMMIT = 0
=============================================
Messages queued:
[NFS4]2008 Nov 30 21:38:35: Server xenbld.sfbay not responding, still trying
=============================================
+--------------------------------------+
    mntinfo4_t: 0xffffff01bc33d000
   NFS Version: 4
      mi_flags: 
MI4_HARD,MI4_INT,MI4_LINK,MI4_SYMLINK,MI4_ACL,MI4_POSIX_LOCK,MI4_INACTIVE_IDLE
      mi_error: 0
 mi_open_files: 0
  mi_msg_count: 0
 mi_recovflags:
mi_recovthread: 0x0
mi_in_recovery: 0
   mount point: /net/xenbld.sfbay/export/xVM-Server/builds
    mount from: xenbld.sfbay:/export/xVM-Server/builds
mi_zone=fffffffffbcd1900
mi_curread=1048576, mi_curwrite=1048576, mi_retrans=5, mi_timeo=600
mi_acregmin=3000000000, mi_acregmax=60000000000,mi_acdirmin=30000000000, 
mi_acdirmax=60000000000
 Server list: ffffff01ad429100

 Current Server: ffffff01ad429100 192.168.79.55:2049

  Total: Server Non-responses=0; Server Failovers=0
IO statistics for this mount
        No. of bytes read         823785672
        No. of read operations      25257
        No. of bytes written            0
        No. of write operations         0
 Async Request queue:
     max threads = 8 active threads = 0
     number requests queued:
     READ_AHEAD = 0    PUTPAGE = 0     PAGEIO = 0    READDIR = 0   INACTIVE = 0 
    COMMIT = 0
=============================================
Messages queued:
=============================================

### they are not yet unmounted, we know that, no MI4_DEAD or MI4_DOWN
### the corresponding vfs_t:

> ffffff01ade36698::print vfs_t
{
    vfs_next = 0xffffff01ade36018
    vfs_prev = 0xffffff01ade360e8
    vfs_op = vfssw+0xd38
    vfs_vnodecovered = 0xffffff01c77d4240
    vfs_flag = 0x2c08

VFS_STATS|VFS_NODEVICES|VFS_XATTR|VFS_NOSETUID  - no VFS_UNMOUNTED

bussiness summary we wait till end of days in that situation.
<snip end>

---
frankB



Reply via email to