Some points that may help with expectations and understanding: - mprotect() works on a process as a whole, it applies to the process (single) address space, and to all threads in that process that share that address space. There are no per-process mprotect semantics (at least not in most OSs I know of).
- The effect of mprotect is not guaranteed to (and should never expected to) appear atomically to all threads. The order in which protection changes applies can vary, and is virtually impossible to predict. - TLB caches are [on all hardware I've ever played with] not coherent. TLB invalidates apply to logical CPU cores, not to threads. - In practice, protect implementations typically impose their semantic changes by changing a memory-resident page table, followed by TLB-invalidate request signals (an interrupt, typically) to all processor (logical) cores that are executing threads in the process. Once all involved cores respond to the TLB invalidate request, the change is known to be committed, as no thread in the process can observe the pre-change page table entry state. - As any thread could be executing own any core, and one core typically has no idea what specific thread another core is running, the order of invalidation across threads can vary erratically. So while I think that your more specific statements about case (2) above will hold, the (3) transitive thing between thread that observed a fault and one that didn't is unlikely. BTW, one of the interesting APIs we use for performance with the C4 collector is a set of no-TLB-invalidating semantic parallels for mprotect(), mermap(), and munmap(). We separate TLB invalidation from address space mapping changes, and enforce TLB invalidation only at very coarse, explicitly requested boundaries. The collector accepts that the page table may be potentially inconsistent across most operations, and enforces consistency (via explicit TLB invalidate requests) only at points where it actually needs it. Since TLB invalidates represent the bulk of execution time cost for mprotect(), mermap(), and munmap() calls, this provides us with dramatically higher MBs-of-address-space-affected-per-second metrics. You can find some early discussion and old numbers in the C4 paper <http://www.azulsystems.com/sites/default/files/images/c4_paper_acm.pdf> , including some reasoning for why a high map-changing rate is needed for sustaining reasonable allocation rates in collectors that perform such changes in the main/common case compaction paths (see section 5 i n the paper). On Tuesday, May 2, 2017 at 8:39:12 PM UTC-7, Yichao Yu wrote: > > >> 3. Does transitivity work? i.e. if therer's a thread 3 that's also > loading > >> from p, and if thread 2 faults on the load while thread 3 doesn't, can > we > >> say that thread 2 faults after the mprotect on thread 1 which is after > the > >> load on thread 3 and therefore the load (and fault) on thread 2 happens > >> after the load on thread 3? > > > > > > Same as (2) above. You'd need to describe how threads 1 and 2 observe > > whether or not thread 3's load has happened, and whether or not it has > > faulted. Then, depending on how that information is communicated, you > would > > be able to establish some ordering. > > The "safe assumption" 2 above also made it more clear to me why I feel > like this might not be true even if both of the previous ones might be > true. I'm now imagining a naive implementation of mprotect where the > access bit is sequentially flipped on all the threads from the thread > issuing mprotect, i.e. mprotect(p, PROT_NONE) is really > > ``` > for (thread: all_threads) { > mprotect_on_thread(thread, p, PROT_NONE) > } > ``` > > In that case for three threads executing (starting `*ga == 0`) > > ``` > // thread 1 > mprotect(p, PROT_NONE) > > // thread 2 > *(volatile int*)p; > a = *ga; // in signal handler > > // thread 3 > *ga = 1; > *(volatile int*)p; > ``` > > A possible "sequential consistent" execution could be > > ``` > mprotect_on_thread(1, p, PROT_NONE) // Thread 1 > mprotect_on_thread(2, p, PROT_NONE) // Thread 1 > *(volatile int*)p; // Thread 2, this faults. > a = *ga; // in signal handler // Thread 2 > *ga = 1; // Thread 3 > *(volatile int*)p; // Thread 3, this doesn't fault. > mprotect_on_thread(3, p, PROT_NONE) // Thread 1 > ``` > > After all three threads finishes execution we'll have `a == 0`. Even > though the `a = *ga` appears to be executed after `mprotect` (though > not after `mprotect` returns) and `*ga = 1` appears to be executed > before `mprotect`. > > Note that this naive model won't break 1 or 2 (both thread 2 and 3 > will still be synchronized to thread 1 since thread 1 cannot do > anything before mprotect returns). Can this actually happen? (I'll > probably test this myself later though I've never tested anything like > this that involves more than two threads yet....) > -- You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
