> address space. There are no per-process mprotect semantics (at least not in
> most OSs I know of).

OT, but there's now `pkey_mprotect`, which is thread local IIRC.

> - The effect of mprotect is not guaranteed to (and should never expected to)
> appear atomically to all threads. The order in which protection changes
> applies can vary, and is virtually impossible to predict.

Yeah, it's the per-CPU TLB cache invalidation what I try to (mentally)
model using a mprotect_on_thread. (or on CPU).

> - In practice, protect implementations typically impose their semantic
> changes by changing a memory-resident page table, followed by TLB-invalidate
> request signals (an interrupt, typically) to all processor (logical) cores
> that are executing threads in the process. Once all involved cores respond
> to the TLB invalidate request, the change is known to be committed, as no
> thread in the process can observe the pre-change page table entry state.

So I assume it's also this request that (at least effectively) flushes
the necessary pipeline/buffers/caches to make sure the TLB
invalidation is ordered wrt other memory operations on the affected
thread (CPU/core).

> So while I think that your more specific statements about case (2) above
> will hold, the (3) transitive thing between thread that observed a fault and
> one that didn't is unlikely.

Thanks. Cool! That matches what I expected now based on your explanation.

> BTW, one of the interesting APIs we use for performance with the C4
> collector is a set of no-TLB-invalidating semantic parallels for mprotect(),
> mermap(), and munmap(). We separate TLB invalidation from address space
> mapping changes, and enforce TLB invalidation only at very coarse,
> explicitly requested boundaries. The collector accepts that the page table
> may be potentially inconsistent across most operations, and enforces
> consistency (via explicit TLB invalidate requests) only at points where it
> actually needs it. Since TLB invalidates represent the bulk of execution
> time cost for mprotect(), mermap(), and munmap() calls, this provides us
> with dramatically higher MBs-of-address-space-affected-per-second metrics.
> You can find some early discussion and old numbers in the C4 paper ,
> including some reasoning for why a high map-changing rate is needed for
> sustaining reasonable allocation rates in collectors that perform such
> changes in the main/common case compaction paths (see section 5 i n the
> paper).

Yeah, I've read that paper before though it was very hard for me to
reason what can be expected after such a no_invalidation mprotect,
especially since (IIUC) the change could still be seen by other thread
before the batch invalidation since the TLB on other thread can be
evacuated for other reasons. Is it similar to reasoning about a
relaxed memory model?

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to