On Sat, Jun 02, 2018 at 03:08:14PM -0700, Philip Guenther wrote: > On Sat, 2 Jun 2018, Christophe Prévotaux wrote: > > This a witness report I got on boot with snapshot Jun 1st amd64 > > > > root on sd0a (9b49e3196b9bfae8.a) swap on sd0b dump on sd0b > > lock order reversal: > > 1st 0xffffff021cdac180 vmmaplk (&map->lock) @ > > /usr/src/sys/uvm/uvm_map.c:4433 > > 2nd 0xffffff01dc5f71a8 inode (&ip->i_lock) > > I believe uvm and the vnode layer handle this correctly, with lock tries > that fall back to releasing the other lock and retrying so progress is > made. The fix for WITNESS complaining is to mark vmmaplk as a vnode lock.
I think there is an actual issue because the locking calls are unconditional. FreeBSD appears to work around the problem by unlocking the vm_map when calling the pager. The diff below adapts that logic to OpenBSD. Because the temporary unlocking may allow another thread to change the vm_map, the code has to check if the map has been altered since the unlocking, and if so, handle the case somehow. The patch uses a best effort approach where the code proceeds from the vm_map entry indicated by the end address of the current vm_map entry. The sanity checks that are done at the start of uvm_map_clean() are not rerun. The system call that triggers the reversal is msync(2), and the reversal can be reproduced with the sys/kern/mmap regression test. sys/kern/mmap3 shows that there is another similar reversal with mlock(2) which is not covered by the patch. Index: uvm/uvm_map.c =================================================================== RCS file: src/sys/uvm/uvm_map.c,v retrieving revision 1.237 diff -u -p -r1.237 uvm_map.c --- uvm/uvm_map.c 18 Apr 2018 16:05:21 -0000 1.237 +++ uvm/uvm_map.c 3 Jun 2018 10:42:59 -0000 @@ -4420,8 +4420,11 @@ uvm_map_clean(struct vm_map *map, vaddr_ struct vm_page *pg; struct uvm_object *uobj; vaddr_t cp_start, cp_end; + vaddr_t ent_start, ent_end; + voff_t ent_offset; int refs; int error; + unsigned int timestamp; boolean_t rv; KASSERT((flags & (PGO_FREE|PGO_DEACTIVATE)) != @@ -4450,8 +4453,7 @@ uvm_map_clean(struct vm_map *map, vaddr_ } error = 0; - for (entry = first; entry != NULL && entry->start < end; - entry = RBT_NEXT(uvm_map_addr, entry)) { + for (entry = first; entry != NULL && entry->start < end; ) { amap = entry->aref.ar_amap; /* top layer */ if (UVM_ET_ISOBJ(entry)) uobj = entry->object.uvm_obj; @@ -4546,13 +4548,27 @@ flush_object: ((flags & PGO_FREE) == 0 || ((entry->max_protection & PROT_WRITE) != 0 && (entry->etype & UVM_ET_COPYONWRITE) == 0))) { + ent_start = entry->start; + ent_end = entry->end; + ent_offset = entry->offset; + timestamp = map->timestamp; + vm_map_unlock_read(map); + rv = uobj->pgops->pgo_flush(uobj, - cp_start - entry->start + entry->offset, - cp_end - entry->start + entry->offset, flags); + cp_start - ent_start + ent_offset, + cp_end - ent_start + ent_offset, flags); if (rv == FALSE) error = EFAULT; - } + + vm_map_lock_read(map); + if (timestamp != map->timestamp) + entry = uvm_map_entrybyaddr(&map->addr, + ent_end); + else + entry = RBT_NEXT(uvm_map_addr, entry); + } else + entry = RBT_NEXT(uvm_map_addr, entry); } vm_map_unlock_read(map);