On Sat, Jun 02, 2018 at 03:08:14PM -0700, Philip Guenther wrote:
> On Sat, 2 Jun 2018, Christophe Prévotaux wrote:
> > This a witness report I got on boot with snapshot Jun 1st amd64
> >
> > root on sd0a (9b49e3196b9bfae8.a) swap on sd0b dump on sd0b
> > lock order reversal:
> > 1st 0xffffff021cdac180 vmmaplk (&map->lock) @
> > /usr/src/sys/uvm/uvm_map.c:4433
> > 2nd 0xffffff01dc5f71a8 inode (&ip->i_lock)
>
> I believe uvm and the vnode layer handle this correctly, with lock tries
> that fall back to releasing the other lock and retrying so progress is
> made. The fix for WITNESS complaining is to mark vmmaplk as a vnode lock.
I think there is an actual issue because the locking calls are
unconditional. FreeBSD appears to work around the problem by unlocking
the vm_map when calling the pager. The diff below adapts that logic
to OpenBSD.
Because the temporary unlocking may allow another thread to change the
vm_map, the code has to check if the map has been altered since the
unlocking, and if so, handle the case somehow. The patch uses a best
effort approach where the code proceeds from the vm_map entry indicated
by the end address of the current vm_map entry. The sanity checks that
are done at the start of uvm_map_clean() are not rerun.
The system call that triggers the reversal is msync(2), and the
reversal can be reproduced with the sys/kern/mmap regression test.
sys/kern/mmap3 shows that there is another similar reversal with
mlock(2) which is not covered by the patch.
Index: uvm/uvm_map.c
===================================================================
RCS file: src/sys/uvm/uvm_map.c,v
retrieving revision 1.237
diff -u -p -r1.237 uvm_map.c
--- uvm/uvm_map.c 18 Apr 2018 16:05:21 -0000 1.237
+++ uvm/uvm_map.c 3 Jun 2018 10:42:59 -0000
@@ -4420,8 +4420,11 @@ uvm_map_clean(struct vm_map *map, vaddr_
struct vm_page *pg;
struct uvm_object *uobj;
vaddr_t cp_start, cp_end;
+ vaddr_t ent_start, ent_end;
+ voff_t ent_offset;
int refs;
int error;
+ unsigned int timestamp;
boolean_t rv;
KASSERT((flags & (PGO_FREE|PGO_DEACTIVATE)) !=
@@ -4450,8 +4453,7 @@ uvm_map_clean(struct vm_map *map, vaddr_
}
error = 0;
- for (entry = first; entry != NULL && entry->start < end;
- entry = RBT_NEXT(uvm_map_addr, entry)) {
+ for (entry = first; entry != NULL && entry->start < end; ) {
amap = entry->aref.ar_amap; /* top layer */
if (UVM_ET_ISOBJ(entry))
uobj = entry->object.uvm_obj;
@@ -4546,13 +4548,27 @@ flush_object:
((flags & PGO_FREE) == 0 ||
((entry->max_protection & PROT_WRITE) != 0 &&
(entry->etype & UVM_ET_COPYONWRITE) == 0))) {
+ ent_start = entry->start;
+ ent_end = entry->end;
+ ent_offset = entry->offset;
+ timestamp = map->timestamp;
+ vm_map_unlock_read(map);
+
rv = uobj->pgops->pgo_flush(uobj,
- cp_start - entry->start + entry->offset,
- cp_end - entry->start + entry->offset, flags);
+ cp_start - ent_start + ent_offset,
+ cp_end - ent_start + ent_offset, flags);
if (rv == FALSE)
error = EFAULT;
- }
+
+ vm_map_lock_read(map);
+ if (timestamp != map->timestamp)
+ entry = uvm_map_entrybyaddr(&map->addr,
+ ent_end);
+ else
+ entry = RBT_NEXT(uvm_map_addr, entry);
+ } else
+ entry = RBT_NEXT(uvm_map_addr, entry);
}
vm_map_unlock_read(map);