On Sat, Nov 15, 2025 at 05:18:57AM +0100, Mateusz Guzik wrote: > On Sat, Nov 15, 2025 at 3:12 AM John Baldwin <[email protected]> wrote: > > > > On 11/11/25 09:59, Mark Johnston wrote: > > > The branch main has been updated by markj: > > > > > > URL: > > > https://cgit.FreeBSD.org/src/commit/?id=99cb3dca4773fe4a16c500f9cb55fcd62cd8d7f3 > > > > > > commit 99cb3dca4773fe4a16c500f9cb55fcd62cd8d7f3 > > > Author: Mark Johnston <[email protected]> > > > AuthorDate: 2025-11-11 14:47:06 +0000 > > > Commit: Mark Johnston <[email protected]> > > > CommitDate: 2025-11-11 14:58:59 +0000 > > > > > > vnode: Rework vput() to avoid holding the vnode lock after > > > decrementing > > > > > > It is not safe to modify the vnode structure after releasing one's > > > reference. Modify vput() to avoid this. Use > > > refcount_release_if_last() > > > to opportunistically call vput_final() with the vnode lock held > > > since we > > > need the vnode lock in order to deactivate the vnode, and it's silly > > > to > > > drop the vnode lock and immediately reacquire it in this common case. > > > > > > Note that vunref() has a similar flaw. D52628 aims to fix the > > > problem > > > more holistically, but this change fixes observable panics in the > > > meantime. > > > > > > Reported by: [email protected] > > > Reported by: [email protected] > > > Reviewed by: kib > > > MFC after: 3 days > > > Differential Revision: https://reviews.freebsd.org/D52608 > > > --- > > > sys/kern/vfs_subr.c | 7 ++++--- > > > 1 file changed, 4 insertions(+), 3 deletions(-) > > > > > > diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c > > > index 58975f7ac932..9cf983f6f89d 100644 > > > --- a/sys/kern/vfs_subr.c > > > +++ b/sys/kern/vfs_subr.c > > > @@ -3713,11 +3713,12 @@ vput(struct vnode *vp) > > > > > > ASSERT_VOP_LOCKED(vp, __func__); > > > ASSERT_VI_UNLOCKED(vp, __func__); > > > - if (!refcount_release(&vp->v_usecount)) { > > > - VOP_UNLOCK(vp); > > > + if (refcount_release_if_last(&vp->v_usecount)) { > > > + vput_final(vp, VPUT); > > > return; > > > } > > > - vput_final(vp, VPUT); > > > + VOP_UNLOCK(vp); > > > + vrele(vp); > > > } > > > > This commit results in reliable hangs for me when I do a 'make -j 16 > > buildkernel' > > in a VM where the source directory is mounted over NFS. Even if there is > > nothing > > to build so the make is just traversing all the directories. The hung > > process is > > stuck on a vnode lock. > > > > Some of the stacks from procstat: > > > > PID TID COMM TDNAME KSTACK > > 1858 100343 make - mi_switch+0x172 > > sleepq_switch+0x109 sleeplk+0x110 lockmgr_slock_hard+0x3c3 > > VOP_LOCK1_APV+0x32 nfs_lock+0x2c vop_sigdefer+0x30 VOP_LOCK1_APV+0x32 > > _vn_lock+0x53 vget_finish+0xaf cache_lookup+0x2be nfs_lookup+0x4e1 > > vop_sigdefer+0x30 VOP_LOOKUP_APV+0x57 vfs_lookup+0x5aa namei+0x35d > > vn_open_cred+0x592 openatfp+0x2b7 > > root@head:~ # procstat -kk 1860 > > 1860 100360 make - mi_switch+0x172 > > sleepq_switch+0x109 sleeplk+0x110 lockmgr_slock_hard+0x3c3 > > VOP_LOCK1_APV+0x32 nfs_lock+0x2c vop_sigdefer+0x30 VOP_LOCK1_APV+0x32 > > _vn_lock+0x53 vget_finish+0xaf cache_lookup+0x2be nfs_lookup+0x4e1 > > vop_sigdefer+0x30 VOP_LOOKUP_APV+0x57 vfs_lookup+0x5aa namei+0x35d > > vn_open_cred+0x592 openatfp+0x2b7 > > root@head:~ # procstat -kk 1855 > > 1855 100314 make - mi_switch+0x172 > > sleepq_switch+0x109 sleeplk+0x110 lockmgr_xlock_hard+0x3d0 > > VOP_LOCK1_APV+0x32 nfs_lock+0x2c vop_sigdefer+0x30 VOP_LOCK1_APV+0x32 > > _vn_lock+0x53 vput_final+0x269 vput+0xab vfs_lookup+0xa7a namei+0x35d > > vn_open_cred+0x592 openatfp+0x2b7 sys_open+0x28 amd64_syscall+0x169 > > fast_syscall_common+0xf8 > > > > This last one is kind of interesting as it calls vput_final? > > > > (kgdb) p vp > > $2 = (struct vnode *) 0xfffff8016bb6c898 > > (kgdb) p vp->v_lock > > $3 = {lock_object = {lo_name = 0xffffffff81467118 <nfs_vnode_tag> "nfs", > > lo_flags = 117112832, lo_data = 0, lo_witness = 0xfffff8043fd83900}, > > lk_lock = 39, lk_exslpfail = 0, lk_pri = 44, lk_timo = 6} > > (kgdb) p/x vp->v_lock->lk_lock > > $4 = 0x27 > > > > So there is one shared lock, and both shared and exclusive waiters. Seems > > like > > a vnode lock has been leaked? Or rather, the vput_final() in this case is > > called from vrele(), so the vput() for 1855 has hit the new path. I wonder > > if > > when it dropped its lock via VOP_UNLOCK() it is no longer eligible to > > re-lock > > it as it needs some other sharer to also unlock before it can re-acquire the > > lock in vput_final? LK_UPGRADE attempts don't block if there are shared > > waiters > > but only one 1 shared lock held, but if you do LK_UNLOCK and then LK_XLOCK > > that > > will indeed honor the other waiters and not acquire the lock, so a vput that > > held the last shared lock would have "worked" to ugprade before even with > > other waiters, but now it can drop the lock and then get stuck. I guess the > > question though in this case is that some other thread must have snuck in > > and acquired a shared lock between the VOP_UNLOCK and the vrele() and is > > blocked while holding it. > > > > Aha, so most of the other makes are stuck in VOP_LOOKUP on this same vnode > > (which I believe is "/usr/src/sys/dev" as in 1855 the vput() is inside of > > namei() and this is the state of ndp: > > > > (kgdb) p *ndp > > $9 = { > > ni_dirp = 0x481afdaa2d7d <error: Cannot access memory at address > > 0x481afdaa2d7d>, ni_segflg = UIO_USERSPACE, ni_rightsneeded = > > 0xfffffe00d83d2d90, > > ni_startdir = 0x0, ni_rootdir = 0xfffff80005e461b8, ni_topdir = 0x0, > > ni_dirfd = -100, ni_lcf = 0, ni_filecaps = {fc_rights = {cr_rights = {0, > > 0}}, fc_ioctls = 0x0, fc_nioctls = -1, fc_fcntls = 0}, > > ni_vp = 0xfffff801bc007000, ni_dvp = 0xfffff8016bb6c898, ni_resflags = 1, > > ni_debugflags = 3, ni_loopcnt = 0, ni_pathlen = 1, > > ni_next = 0xfffff8000a0b0817 "", ni_cnd = {cn_flags = 8936259908, > > cn_cred = 0xfffff80005e53c00, cn_nameiop = LOOKUP, cn_lkflags = 524288, > > cn_pnbuf = 0xfffff8000a0b0800 "/usr/src/sys/dev/acpica", > > cn_nameptr = 0xfffff8000a0b0811 "acpica", cn_namelen = 6}, > > ni_cap_tracker = {tqh_first = 0x0, tqh_last = 0xfffffe00d83d2d48}, > > ni_rbeneath_dpp = 0x0, ni_nctrack_mnt = 0x0, ni_dvp_seqc = 0, ni_vp_seqc > > = 0} > > > > But pid 1844 is a make blocked on another vnode: > > > > (kgdb) p vp > > $24 = (struct vnode *) 0xfffff801bc007000 > > > > And that one appears to be the "acpica" child directory: > > > > (kgdb) p *vp->v_cache_dst.tqh_first > > $27 = {nc_src = {le_next = 0xfffff801bc01b9b8, le_prev = > > 0xfffff801bc01b898}, > > nc_dst = {tqe_next = 0x0, tqe_prev = 0xfffff801bc007058}, nc_hash = { > > csle_next = 0x0}, nc_dvp = 0xfffff8016bb6c898, n_un = { > > nu_vp = 0xfffff801bc007000, nu_neg = {neg_flag = 0 '\000', > > neg_hit = 112 'p'}}, nc_flag = 12 '\f', nc_nlen = 6 '\006', > > nc_name = 0xfffff801bc01b962 "acpica"} > > > > And note it's nc_dvp is the vnode 1855 is trying to vput and all the other > > makes are trying to VOP_LOCK in nfs_lookup(). > > > > So my guess is that 1844 was able to VOP_LOCK() "/usr/src/sys/dev" in the > > window between VOP_UNLOCK() and the vput_final() invoked from vrele(). > > > > Humm, except that the vnode in question shouldn't be in vput_final at all?? > > > > Back to 1855 (proc doing vput_final): > > > > (kgdb) > > #12 0xffffffff80c9bc89 in vput_final (vp=0xfffff8016bb6c898, > > func=func@entry=VRELE) at /usr/src/sys/kern/vfs_subr.c:3628 > > 3628 error = vn_lock(vp, LK_EXCLUSIVE | > > LK_INTERLOCK); > > (kgdb) p vp > > $6 = (struct vnode *) 0xfffff8016bb6c898 > > (kgdb) p vp->v_usecount > > $7 = 14 > > (kgdb) p vp->v_holdcnt > > $8 = 7 > > > > Or did the other make's manage to acquire new use counts while 1855 was > > blocked > > on the lock in vput_final? > > > > Anyway, it seems 14 make's are blocked on this vnode (/usr/src/sys/dev). > > 1844 > > is blocked on the "acpica" vnode, and another make process (1856) is > > blocked on > > "acpica", but via getdirentries() instead of namei(): > > > > #4 0xffffffff80b62ec0 in sleeplk (lk=lk@entry=0xfffff801bc007070, > > flags=flags@entry=2098176, ilk=ilk@entry=0xfffff801bc007098, > > wmesg=wmesg@entry=0xffffffff81467118 <nfs_vnode_tag> "nfs", > > pri=<optimized out>, pri@entry=44, timo=timo@entry=6, queue=1) > > at /usr/src/sys/kern/kern_lock.c:302 > > #5 0xffffffff80b60fc3 in lockmgr_slock_hard (lk=0xfffff801bc007070, > > flags=2098176, ilk=0xfffff801bc007098, file=<optimized out>, line=4333, > > lwa=0x0) at /usr/src/sys/kern/kern_lock.c:693 > > #6 0xffffffff811b97a2 in VOP_LOCK1_APV ( > > vop=0xffffffff81aeeb38 <default_vnodeops>, > > a=a@entry=0xfffffe00d8445c88) > > at vnode_if.c:2103 > > #7 0xffffffff80a6050c in nfs_lock (ap=ap@entry=0xfffffe00d8445c88) > > at /usr/src/sys/fs/nfsclient/nfs_clvnops.c:344 > > #8 0xffffffff80c83d90 in vop_sigdefer (vop=<optimized out>, > > a=0xfffffe00d8445c88) at /usr/src/sys/kern/vfs_default.c:1502 > > #9 0xffffffff811b97a2 in VOP_LOCK1_APV ( > > vop=0xffffffff81aada28 <newnfs_vnodeops>, a=a@entry=0xfffffe00d8445c88) > > at vnode_if.c:2103 > > #10 0xffffffff80cb6b43 in VOP_LOCK1 (vp=0xfffff801bc007000, flags=2098176, > > file=0xffffffff81359deb "/usr/src/sys/kern/vfs_syscalls.c", line=4333) > > at ./vnode_if.h:1316 > > #11 _vn_lock (vp=vp@entry=0xfffff801bc007000, flags=flags@entry=2098176, > > file=<unavailable>, line=<unavailable>, line@entry=4333) > > at /usr/src/sys/kern/vfs_vnops.c:1970 > > #12 0xffffffff80cb2d94 in kern_getdirentries (td=0xfffff80006cb5000, > > fd=<optimized out>, > > buf=0x2088dba27000 <error: Cannot access memory at address > > 0x2088dba27000>, count=4096, basep=basep@entry=0xfffffe00d8445df0, > > residp=residp@entry=0x0, > > bufseg=UIO_USERSPACE) at /usr/src/sys/kern/vfs_syscalls.c:4333 > > #13 0xffffffff80cb32a9 in sys_getdirentries (td=<unavailable>, > > uap=0xfffff80006cb5428) at /usr/src/sys/kern/vfs_syscalls.c:4287 > > > > The "acpica" lock is held by a writer at least: > > > > (kgdb) p vp > > $9 = (struct vnode *) 0xfffff801bc007000 > > (kgdb) p vp->v_lock > > $10 = {lock_object = {lo_name = 0xffffffff81467118 <nfs_vnode_tag> "nfs", > > lo_flags = 117112832, lo_data = 0, lo_witness = 0xfffff8043fd83900}, > > lk_lock = 18446735277917198210, lk_exslpfail = 0, lk_pri = 44, lk_timo = > > 6} > > (kgdb) p/x vp->v_lock.lk_lock > > $14 = 0xfffff80011ebd782 > > (kgdb) set $td = (struct thread *)0xfffff80011ebd780 > > (kgdb) p $td->td_tid > > $15 = 100314 > > (kgdb) p $td->td_proc->p_comm > > $16 = "make", '\000' <repeats 15 times> > > (kgdb) p $td->td_proc->p_pid > > $17 = 1855 > > > > Oh dear, so that's the deadlock. > > > > 1855 needs to get an exclusive lock on "/usr/src/sys/dev/" to finish its > > vput() > > _while_ it holds an exclusive lock on "/usr/src/sys/dev/acpica". And the > > other > > shared lock on "/usr/src/sys/dev" is held by 1844 who is blocked trying to > > share > > lock "/usr/src/sys/dev/acpica" during lookup(). > > > > So by doing the VOP_UNLOCK() and then reacquiring the lock via vrele(), > > you've > > introduced a vnode LOR as the vput() of the parent directory can now try to > > re-lock the vnode of the parent directory while we hold a vnode on the > > child vnode. > > > > This should be easy to fix though. > > While the routine must not vrele, the VPUT should be patched to > trylock (aka LK_NOWAIT) and EAGAIN if that fails, which in turn will > punt the vnode to a workqueue. > > However, one has to wonder if failing to lock here will be frequent > enough to constitute a problem. There is a counter for these > (deferred_inactive). > > The following was only boot-tested:
This patch also causes vput() to unlock and immediately re-lock the vnode when deactivating, which is a common case. For 15.0 at least, I'd rather work around the bug more directly: https://reviews.freebsd.org/D53774 > diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c > index 58975f7ac932..5cde78162e6a 100644 > --- a/sys/kern/vfs_subr.c > +++ b/sys/kern/vfs_subr.c > @@ -3640,11 +3640,8 @@ vput_final(struct vnode *vp, enum vput_op func) > break; > case VPUT: > want_unlock = true; > - if (VOP_ISLOCKED(vp) != LK_EXCLUSIVE) { > - error = VOP_LOCK(vp, LK_UPGRADE | LK_INTERLOCK | > - LK_NOWAIT); > - VI_LOCK(vp); > - } > + error = VOP_LOCK(vp, LK_EXCLUSIVE | LK_INTERLOCK | LK_NOWAIT); > + VI_LOCK(vp); > break; > case VUNREF: > if (VOP_ISLOCKED(vp) != LK_EXCLUSIVE) { > @@ -3672,12 +3669,9 @@ vput_final(struct vnode *vp, enum vput_op func) > } > if (func == VUNREF) > vp->v_vflag &= ~VV_UNREF; > - vdropl(vp); > - return; > out: > - if (func == VPUT) > - VOP_UNLOCK(vp); > vdropl(vp); > + return; > } > > /* > @@ -3713,8 +3707,8 @@ vput(struct vnode *vp) > > ASSERT_VOP_LOCKED(vp, __func__); > ASSERT_VI_UNLOCKED(vp, __func__); > + VOP_UNLOCK(vp); > if (!refcount_release(&vp->v_usecount)) { > - VOP_UNLOCK(vp); > return; > } > vput_final(vp, VPUT);
