> On Mar 15, 2019, at 2:48 PM, Robert Elz <[email protected]> wrote:
> Upon reflection, there is no hurry to fix this one, unlike the previous
> one which was screwing up the b5 tests - we (at least currently) have no
> tests which do anything as crazy as the code sequence to trigger this, so
> we can take our time and solve it properly.
Well, true, but I think a "fix before netbsd-9, will pull-ups to -8 and -7" is
certainly a worthy goal. After all, there is now a known sequence of calls
that can cause a crash.
> | POSIX's semantics could just as well be represented with a bit
> | in a flags word,
>
> the one in the UVM map entry - yes, but that ons isn't really the
> issue. What matters is the pmap count, and even in posix that needs
> to be a count, as multiple processes can independently lock the same
> (shared) region, and neither one's unwire affects the wiring done by
> another.
>
> Unless my assumptions about what is what here are incorrect (which they
> easily could be) the count that matters is the one which needs to remain
> a count.
The pmap layer doesn't really have a count. It just has a "this PTE is wired"
bit. When the vm_map_entry that covers that PTE transitions from "not-wired"
to "wired", the PTE gets the wired bit; when the vm_map_entry transitions from
"wired" to "not-wired", the PTE loses the bit. It's really as simple as that.
The pmap layer doesn't assume a count, it just depends on the upper layers
keeping track of the state transitions, and updating the bottom layer
accordingly.
The same goes for the backing pages -- you've probably noticed that the pages
are either wired or unwired only at those rising and falling edges of
vm_map_entry "wired-ness", but the pages, of course, have a count in them
because there can exist multiple mappings for a page.
UVM history lesson time! In some ways it's slightly silly to even have a
wire_count in the vm_map_entry, because vm_map_entry's are not really shared
... they exist only in a single vm_map, and they correspond to one or more PTEs
in the pmap's tables (one pmap per uvm_map)... but the count is in some ways an
artifact of how uvm_vslock() / uvm_vsunlock() used to work ... they *used* to
call uvm_map_pageable() (because the old Mach VM implementation used to call
vm_map_pageable()) for doing physio and other things that necessitated wiring
down user buffers so the kernel / devices could safely access them. But that
changed some 2 decades ago (again, I think this may have been my fault :-) for
a couple of reasons:
(1) munlock(2) and its semantics; you don't want it to unwire the
buffers that a device is going to DMA into!
(2) uvm_map_pageable() can fragment the map because of the entry
clipping.
...so the transient wirings used by uvm_vslock() and uvm_vsunlock() were
changed to use uvm_fault_wire() and uvm_fault_unwire() directly, to
specifically fiddle with the wired-ness of the underlying pages, while leaving
the vm_map_entry's unchanged.
> | I would suggest that the right way to fix this would be as follows:
>
> I think we ought to work out what the data structs should look like
> in the various possible cases - including mixed shm and m*() allocations,
> mappings, wiring, protection schemes - including where pages are
> mapped (either in more than once in one process, or in different
> processes) in both forms (a page that is a shm in one place is mmap'd
> in another, and wired by one of them, or both, or neither).
This should be relatively straight forward... I'll see if I can put together a
couple of diagrams this weekend between various kid / household duties (and
also recovering from this bout of late winter flu that's kept me out of my
$DayJob office for a couple of days, bleh). The wiring propagation between the
various layers is really all about rising and falling edges, and once you
understand the rules, it's pretty easy to work out what the data structures at
each layer should look like for any given scenario. In fact, the current code
mostly follows those rules; the bugs, it seems, are really in defining what
constitutes a rising or falling edge.
> Until we know what it will look like, I don't think trying to find
> minimal code changes from what we have now will be productive.
>
> First we need an audit of everything that affects or uses the UVM
> mappings to see just what is required. The shm stuff is easy that
> way, as they have a very small visible footprint - even if they are
> an ugly design.
>
> Tomorrow (or much later today, or whatever you want to call it!)
>
> kre
>
-- thorpej