Careful, Robert, you're going to end up the designated maintainer of UVM at 
this rate :-)

> On Mar 15, 2019, at 12:11 PM, Robert Elz <k...@munnari.oz.au> wrote:
> 
> However, the relationship between the shm() functions and the uvm
> (m*() functions) looks to be something of a mess - we really cannot keep
> a state where by doing bizarre combinations of calls from those two
> sets, which internally seem to not know much about each other, can
> either make a mess of the kernel data structures, or crash the kernel.

"Yuck."

So, there are two competing, incompatible API contracts here:

1==> SVID shared memory's "locking" extension that requires balanced 
wire/unwire calls.

2==> POSIX Realtime extensions mlock(2) / munlock(2), which does not require 
balancing wire/unwire.

SVID's maps nicely to the "wire_count" field in the vm_map_entry_t.  POSIX does 
not, but we sort of coerce it by letting it increment wantonly and then forcing 
to 0 when munlock(2) comes in (I'm pretty sure this is my fault, actually; I 
will find a sword to fall on).

POSIX's semantics could just as well be represented with a bit in a flags word, 
and handily there is one in vm_map_entry_t (why it's a uint8_t when just about 
every ABI is going to round that to at least an int, in terms of allocated 
space, *shrug*).

I would suggest that the right way to fix this would be as follows:

1==> Use a bit in the entry flags to hold the POSIX "mlock" wiring state.  This 
includes entries wired as a result of VM_MAP_WIREFUTURE (which is a proxy for 
mlockall(2)'s MCL_FUTURE).

2==> Change VM_MAPENT_ISWIRED() to consult this flags bit in addition to the 
wire_count.  Audit the code to make sure everything uses this macro rather than 
just looking at entry->wire_count.  (This audit could be done first, 
independently.)

3==> Add a new flag to uvm_map_pageable() that says you're performing an 
mlock-sematics wiring change.  If that flag it set, you either fiddle the bit, 
or fiddle the count.  A new flag to indicate "no, REALLY unwire it" that clears 
both the bit and sets the wire_count to 0 might also be necessary to handle 
certain cases (like removing mappings, process exit / exec, etc. -- I think 
uvm_map_pageable() is used in those cases before the map entry is removed, but 
I'm not 100% sure).

4==> A convenience function to determine if a map entry's wiring state has 
changed that considers the mlock bit as well as the wire_count, maybe something 
like this?

bool
uvm_map_entry_wiring_changed(
        const vm_map_entry_t * const entry,
        const u_int new_mlock_flag,
        const int wire_count_delta)
{
        const bool mlock_flag_changed = ((entry->flags & UVM_MAP_MLOCKED) ^ 
new_mlock_flag) != 0;
        const int new_wire_count = entry->wire_count + wire_count_delta;
        const bool wire_count_changed = entry->wire_count ? new_wire_count == 0 
: new_wire_count != 0;

        return mlock_flag_changed || wire_count_changed;
}

Use of this function would replace checking if the wiring count is 0 (and about 
to become 1) or 1 (and about to become 0) in various places.

It seems like doing this would allow both APIs to be used together without 
mangling the internal kernel state.  Of course munlock() would not actually 
unwire SVID shared memory regions, but that seems OK ... the two APIs are 
independent, and it seems like this is a reasonable "undefined" behavior 
resulting from mixing them together.

I don't think this would actually be a ton of work, and I'd be happy to guide 
you along.

-- thorpej

Reply via email to