:Hi,
:
:I've just tripped over an obviously long-standing (since about
:Jan. 1998) bug in vinvalbuf while looking into PR kern/26224. The
:problematic code looks like (on -CURRENT):
:
: /*
: * Destroy the copy in the VM cache, too.
: */
: mtx_lock(&vp->v_interlock);
: if (VOP_GETVOBJECT(vp, &object) == 0) {
: vm_object_page_remove(object, 0, 0,
: (flags & V_SAVE) ? TRUE : FALSE);
: }
: mtx_unlock(&vp->v_interlock);
:
:The locks seem to be needed for file systems that don't perform real
:locking (on -STABLE, they are simplelocks).
:This, however, is incorrect because vm_object_page_remove may sleep.
:I've attached a patch that passes the interlock to
:vm_object_page_remove, which in turn passes it to a modified version
:of vm_page_sleep, which unlocks it around the sleep.
:I think that this is correct, because the object should be in a valid
:state when we sleep (and all checks are reexecuted in that case).
:
:Since I'm not very experienced with vfs and vm stuff, I'd be glad if
:this patch could get some review. In particular, is the lock/unlock
:pair really still needed, and are there notable differeces in -STABLE
:(because the fix would need the be MFC'ed)?
:
:Thanks,
: - thomas
Ok, I've looked at this more. What is supposed to happen is that
the 'while (vp->v_numoutput) ...' code just above the section you
cited is supposed to prevent the deadlock. The while code looks like
this:
while (vp->v_numoutput > 0) {
vp->v_flag |= VBWAIT;
tsleep(&vp->v_numoutput, PVM, "vnvlbv", 0);
}
However, as Ian points out in his followup, it doesn't work:
:...
:I've seen a related deadlock here in 4.x with NFS; vm_fault locks
:a page and then calls vput which aquires the v_interlock. This code
:in vinvalbuf locks the v_interlock first, and then vm_object_page_remove()
:locks the page. That's a simple lock-order reversal deadlock which I
:guess would still exist even with this patch.
It doesn't work for the simple reason that vm_fault isn't busying the
page for writing, it's busying it for reading, so v_numoutput will be
0.
I think the best solution is to change vinvalbuf's wait loop to
wait on vm_object->paging_in_progress instead of vp->v_numoutput,
or perhaps to wait for both conditions to be satisfied.
vm_object->paging_in_progress applies to reads and writes, while
vp->v_numoutput only applies to writes.
I know this isn't the most correct solution... there is still the
lock reversal if vm_object_remove_pages() were ever to sleep. The
idea is that it won't sleep if there is no paging in progress because
nobody will have any of the object's pages locked. I think it is
the best we can do for the moment. There are several places in
vm/vm_object.c where the same assumption is made (testing against
vm_object->paging_in_progress, though, which works, not vp->v_numoutput).
-
Thomas, if you are interested this could be a little project for you.
See if you can code-up another while() loop to also wait for
the object's paging_in_progress count to become 0 in the vinvalbuf()
code. Look in vm/vm_object.c for examples of how to construct such
a loop. I will be happy to review and test whatever you come up with
and I'm sure Ian will too!
-Matt
To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message