On 23/01/23(Mon) 22:57, David Hill wrote: > On 1/20/23 09:02, Martin Pieuchot wrote: > > > [...] > > > Ran it 20 times and all completed and passed. I was also able to > > > interrupt > > > it as well. no issues. > > > > > > Excellent! > > > > Here's the best fix I could come up with. We mark the VM map as "busy" > > during the page fault just before the faulting thread releases the shared > > lock. This ensures no other thread will grab an exclusive lock until the > > fault is finished. > > > > I couldn't trigger the reproducer with this, can you? > > Yes, same result as before. This patch does not seem to help.
Is it the same as before? I doubt it is. On a 4-CPU machine I can't trigger the race described in this thread. On a 8-CPU one I now see all threads sleeping on "thrsleep" except one in "kqread" and one in "wait". I don't know what's happening. I don't know how to debug Go code. I can't say if this is a race in UVM or something else. The original race reported in this thread seems fixed by the diff I sent. I don't know if there's another one or if I missed something. I can't figure it out by myself, I'd appreciated if somebody else with some knowledge in Go could give me a hand. Joel could you find some time to reproduce the hang with the diff below applied? To reproduce, just do: $ doas pkg_add git go $ git clone https://github.com/etcd-io/bbolt.git $ cd bbolt $ git checkout v1.3.6 $ go test -v -run TestSimulate_10000op_10p Index: uvm/uvm_fault.c =================================================================== RCS file: /cvs/src/sys/uvm/uvm_fault.c,v retrieving revision 1.133 diff -u -p -r1.133 uvm_fault.c --- uvm/uvm_fault.c 4 Nov 2022 09:36:44 -0000 1.133 +++ uvm/uvm_fault.c 20 Jan 2023 13:52:58 -0000 @@ -1277,6 +1277,9 @@ uvm_fault_lower(struct uvm_faultinfo *uf /* update rusage counters */ curproc->p_ru.ru_majflt++; + /* prevent siblings to grab an exclusive lock on the map */ + vm_map_busy(ufi->map); + uvmfault_unlockall(ufi, amap, NULL); counters_inc(uvmexp_counters, flt_get); @@ -1293,13 +1296,16 @@ uvm_fault_lower(struct uvm_faultinfo *uf KASSERT(result != VM_PAGER_PEND); if (result == VM_PAGER_AGAIN) { + vm_map_unbusy(ufi->map); tsleep_nsec(&nowake, PVM, "fltagain2", MSEC_TO_NSEC(5)); return ERESTART; } - if (!UVM_ET_ISNOFAULT(ufi->entry)) + if (!UVM_ET_ISNOFAULT(ufi->entry)) { + vm_map_unbusy(ufi->map); return (EIO); + } uobjpage = PGO_DONTCARE; uobj = NULL; @@ -1308,6 +1314,7 @@ uvm_fault_lower(struct uvm_faultinfo *uf /* re-verify the state of the world. */ locked = uvmfault_relock(ufi); + vm_map_unbusy(ufi->map); if (locked && amap != NULL) amap_lock(amap);