First off, thanks for the report.  (Sorry about not getting back to you 
when you emailed me earlier; travel and work intervened.)


On Fri, 24 Jul 2015, cheeky.m wrote:
> # while true ; do mount /dev/sd3a /a ; umount /dev/sd3a ; done
> $ while true ; do ls /a > /dev/null ; done
> 
> After about 45 seconds, the machine will lockup
> It still responds to pings. 
> amd64 in vmware, but also a physical machine will lockup
> takes longer with a sleep 1 between the mount/unmount, but still locks.
...
> No processes are startable when that happens, a running top will keep
> running, but cannot kill the ksh processes.
> 
> top will stay running, but cannot kill the ksh
> 
> PID USERNAME PRI NICE SIZE RES STATE WAIT TIME CPU COMMAND
> 28124 test 18 0 648K 748K sleep pause 0:03 0.54% ksh
> 10055 root 18 0 640K 740K sleep pause 0:01 0.34% ksh
> 1 root 10 0 464K 540K idle wait 0:01 0.00% init
> 13800 _x11 2 0 11M 15M idle select 0:01 0.00% Xorg
> 11954 root 10 0 264K 292K idle wait 0:00 0.00% mount
> 
> the two ksh processes are in pause state. mount in wait.
> 
> a looping ps -al to show wchan and state:
> 1000 12949 1924 36 14 0 340 352 vfslock D+ p5 0:00.00 ls /a
> 1000 17565 7901 1 18 0 644 732 pause Is p1 0:00.00 -ksh (ksh)
> 1000 28650 9250 0 18 0 640 732 pause Is p2 0:00.00 -ksh (ksh)
> 0 31431 28650 36 18 0 644 744 pause S p2 0:01.51 -ksh (ksh)
> 0 27668 31431 36 -5 0 244 232 biowait D+ p2 0:00.00 umount /dev/sd3a
> 1000 19565 6707 3 18 0 644 736 pause Is p3 0:00.00 -ksh (ksh)
> 1000 21718 24499 0 18 0 640 736 pause Ss p4 0:00.07 -ksh (ksh)
> 1000 1924 11677 36 18 0 640 732 pause Ss p5 0:00.33 -ksh (ksh)
> 1000 27159 1924 36 14 0 340 344 vfslock D+ p5 0:00.00 ls /a
> 1000 17565 7901 1 18 0 644 732 pause Is p1 0:00.00 -ksh (ksh)
> 1000 28650 9250 0 18 0 640 732 pause Is p2 0:00.00 -ksh (ksh)
> 0 31431 28650 36 18 0 644 744 pause S p2 0:01.52 -ksh (ksh)
> 0 22265 31431 36 -5 0 240 232 biowait D+ p2 0:00.00 umount /dev/sd3a

That last chunk shows the pid on the ls and umount are changing, so it's 
before the hang.  I reporduced the hang and breaking into ddb I found that 
it's a deadlock between an ls process doing fstatat("/mnt") and a mount 
process doing mount().  Typed in from a phone photo:

ddb{0}> trae /p 0518539
sleep_finish() at sleep_finish+0xc4

rw_enter() at rw_enter+0x19d
rrw_enter() at rrw_enter+0x48
VOP_LOCK() at VOP_LOCK+0x37
vn_lock() at vn_lock +0x6c
vget() at vget+0xf1
cache_lookup() at cache_lookup+0x2e6
ufs_lookup() at ufs_lookup+0x10a
VOP_LOOKUP() at VOP_LOOKUP+0x39
vfs_lookup() at vfs_lookup+0x374
namei() at namei+0x241
dofstatat() at dofstatat+0x89
syscall() at syscall+0x1bc
--- syscall (number 42) ---
...

ddb{0}> trace /p 0t20363
sleep_finish() at sleep_finish+0xv4
rw_enter() at rw_enter+0x19d
rrw_enter() at rrw_enter+0x48
VOP_LOCK() at VOP_LOCK+0x37
vn_lock() at vn_lock +0x6c
vfs_lookup() at vfs_lookup+0x8c
namei() at namei+0x241
ffs_mount() at ffs_mount+0x2cb
sys_mount() at sys_mount+0x344
syscall() at syscall+0x1bc
--- syscall (number 21) ---
end of kernel
...


So, the problem is that sys_mount() locks the vnode of the directory to be 
mounted on (/mnt), and then tries to look up the device to be mounted 
(/dev/sd0j in my case) for which vfs_lookup() tries to momentarily grab 
the lock on "/".  Grabbing a lock higher in the tree while you hold a 
lower one is a no-no; here it deadlocks with ls's fstatat() holding the 
lock on "/" while it blocks trying to get the lock on "/mnt"

I think we need to change sys_mount() to not use LOCKLEAF the first time, 
not do most of the checks at the start but rather look up the path again 
with LOCKLEAF _after_ the VFS_MOUNT() call, error if it's changed from the 
first time, and do the previous checks now that all the requisite paths 
are resolved in a safe order.

This will take a bit of work...


Philip Guenther

Reply via email to