On 9 Mar 2025, at 09:04, Demi Marie Obenour via Devel <devel@sel4.systems> wrote: > > On 3/8/25 4:49 PM, Gernot Heiser via Devel wrote: >> >> >> Why so complicated? >> >> The equivalence of putting the functionality in the kernel, where the kernel >> has access to all user state, is to have a trusted server which has access >> to all client state (especially CSpaces). The TCB is the same, but you don’t >> break verification and retain the option to verify the trusted server. This >> approach will work for all designs. >> >> In a specific system design you’re likely have opportunities to >> simplify/modularise/reduce the power of the server, making verification >> easier. But even the model of the all-powerful server is better (more >> assured) than the model of hacking the kernel and completely abandoning >> verification. > > There are a couple classes of designs I am not sure how to implement > efficiently in this model: > > 1. Systems that run on many loosely coupled cores where statically > partitioning > the kernel heap requires either reserving too much memory at startup, or > playing > silly tricks like adding and removing page table entries in response to page > faults to keep the kernel heap size down.
How does Linux do this? As I keep saying, you’ll need to look at seL4 IPC (or PPC as I prefer to call it) as the seL4 equivalent of a Linux system call: the mechanism to switch protection state. In Linux you don’t call into the kernel on a different core, you change protection state on you own core (and the kernel may chose to hand off to a worker thread somewhere else). In that model it’s clear that you invoke the server on the same core. Depending on design, this may be invoking a shared, passive server locally, which hand off to a worker. Or you invoke a per-core server (meaning there’s a server thread and endpoint per core (or group of closely coupled core). This is system-specific policy. Trying to do this all in the kernel just forces policy in the kernel, which is exactly what a microkernel should avoid. > 2. Systems where the syscall overhead is too high because of speculative > execution mitigations. On x86, each context switch between userspace > processes requires an IBPB, but entering the kernel often doesn’t. This is a non-sequitur. You need to take action for timing-channel prevention when you cross a security domain, Going into and out of the kernel doesn’t, and as such doesn’t require such action. The same is going into and out of a trusted server. Preventing the server from leaking through any shared state (architected or not) is not different from preventing the kernel from leaking through shared state. > I suspect there is other hardware that tags various internal data structures > with the current execution privilege as well, and requires costly flushes > when switching between mutually distrusting code running at the same > CPU privilege level. The trusted server requires 2 such switches, whereas > kernel extensions only need 1. The typical fallacy of trying to solve securty problems “transparently” in hardare, which always forces doing things unnecessarily. Having said that, the argument that syscalls/IPC overhead limits performance doesn’t become more true by repeating it (it’s seeing a renaissance in the scientific literature too). I’m yet to see a realistic use case where seL4 IPC cost is performance limiting, except with poor user-level design. Gernot _______________________________________________ Devel mailing list -- devel@sel4.systems To unsubscribe send an email to devel-leave@sel4.systems