First off, thanks for the report. (Sorry about not getting back to you
when you emailed me earlier; travel and work intervened.)
On Fri, 24 Jul 2015, cheeky.m wrote:
> # while true ; do mount /dev/sd3a /a ; umount /dev/sd3a ; done
> $ while true ; do ls /a > /dev/null ; done
>
> After about 45 seconds, the machine will lockup
> It still responds to pings.
> amd64 in vmware, but also a physical machine will lockup
> takes longer with a sleep 1 between the mount/unmount, but still locks.
...
> No processes are startable when that happens, a running top will keep
> running, but cannot kill the ksh processes.
>
> top will stay running, but cannot kill the ksh
>
> PID USERNAME PRI NICE SIZE RES STATE WAIT TIME CPU COMMAND
> 28124 test 18 0 648K 748K sleep pause 0:03 0.54% ksh
> 10055 root 18 0 640K 740K sleep pause 0:01 0.34% ksh
> 1 root 10 0 464K 540K idle wait 0:01 0.00% init
> 13800 _x11 2 0 11M 15M idle select 0:01 0.00% Xorg
> 11954 root 10 0 264K 292K idle wait 0:00 0.00% mount
>
> the two ksh processes are in pause state. mount in wait.
>
> a looping ps -al to show wchan and state:
> 1000 12949 1924 36 14 0 340 352 vfslock D+ p5 0:00.00 ls /a
> 1000 17565 7901 1 18 0 644 732 pause Is p1 0:00.00 -ksh (ksh)
> 1000 28650 9250 0 18 0 640 732 pause Is p2 0:00.00 -ksh (ksh)
> 0 31431 28650 36 18 0 644 744 pause S p2 0:01.51 -ksh (ksh)
> 0 27668 31431 36 -5 0 244 232 biowait D+ p2 0:00.00 umount /dev/sd3a
> 1000 19565 6707 3 18 0 644 736 pause Is p3 0:00.00 -ksh (ksh)
> 1000 21718 24499 0 18 0 640 736 pause Ss p4 0:00.07 -ksh (ksh)
> 1000 1924 11677 36 18 0 640 732 pause Ss p5 0:00.33 -ksh (ksh)
> 1000 27159 1924 36 14 0 340 344 vfslock D+ p5 0:00.00 ls /a
> 1000 17565 7901 1 18 0 644 732 pause Is p1 0:00.00 -ksh (ksh)
> 1000 28650 9250 0 18 0 640 732 pause Is p2 0:00.00 -ksh (ksh)
> 0 31431 28650 36 18 0 644 744 pause S p2 0:01.52 -ksh (ksh)
> 0 22265 31431 36 -5 0 240 232 biowait D+ p2 0:00.00 umount /dev/sd3a
That last chunk shows the pid on the ls and umount are changing, so it's
before the hang. I reporduced the hang and breaking into ddb I found that
it's a deadlock between an ls process doing fstatat("/mnt") and a mount
process doing mount(). Typed in from a phone photo:
ddb{0}> trae /p 0518539
sleep_finish() at sleep_finish+0xc4
rw_enter() at rw_enter+0x19d
rrw_enter() at rrw_enter+0x48
VOP_LOCK() at VOP_LOCK+0x37
vn_lock() at vn_lock +0x6c
vget() at vget+0xf1
cache_lookup() at cache_lookup+0x2e6
ufs_lookup() at ufs_lookup+0x10a
VOP_LOOKUP() at VOP_LOOKUP+0x39
vfs_lookup() at vfs_lookup+0x374
namei() at namei+0x241
dofstatat() at dofstatat+0x89
syscall() at syscall+0x1bc
--- syscall (number 42) ---
...
ddb{0}> trace /p 0t20363
sleep_finish() at sleep_finish+0xv4
rw_enter() at rw_enter+0x19d
rrw_enter() at rrw_enter+0x48
VOP_LOCK() at VOP_LOCK+0x37
vn_lock() at vn_lock +0x6c
vfs_lookup() at vfs_lookup+0x8c
namei() at namei+0x241
ffs_mount() at ffs_mount+0x2cb
sys_mount() at sys_mount+0x344
syscall() at syscall+0x1bc
--- syscall (number 21) ---
end of kernel
...
So, the problem is that sys_mount() locks the vnode of the directory to be
mounted on (/mnt), and then tries to look up the device to be mounted
(/dev/sd0j in my case) for which vfs_lookup() tries to momentarily grab
the lock on "/". Grabbing a lock higher in the tree while you hold a
lower one is a no-no; here it deadlocks with ls's fstatat() holding the
lock on "/" while it blocks trying to get the lock on "/mnt"
I think we need to change sys_mount() to not use LOCKLEAF the first time,
not do most of the checks at the start but rather look up the path again
with LOCKLEAF _after_ the VFS_MOUNT() call, error if it's changed from the
first time, and do the previous checks now that all the requisite paths
are resolved in a safe order.
This will take a bit of work...
Philip Guenther