On 9 Mar 2025, at 09:04, Demi Marie Obenour via Devel <devel@sel4.systems> 
wrote:
> 
> On 3/8/25 4:49 PM, Gernot Heiser via Devel wrote:
>> 
>> 
>> Why so complicated?
>> 
>> The equivalence of putting the functionality in the kernel, where the kernel 
>> has access to all user state, is to have a trusted server which has access 
>> to all client state (especially CSpaces). The TCB is the same, but you don’t 
>> break verification and retain the option to verify the trusted server. This 
>> approach will work for all designs.
>> 
>> In a specific system design you’re likely have opportunities to 
>> simplify/modularise/reduce the power of the server, making verification 
>> easier. But even the model of the all-powerful server is better (more 
>> assured) than the model of hacking the kernel and completely abandoning 
>> verification.
> 
> There are a couple classes of designs I am not sure how to implement 
> efficiently in this model:
> 
> 1. Systems that run on many loosely coupled cores where statically 
> partitioning
>   the kernel heap requires either reserving too much memory at startup, or 
> playing
>   silly tricks like adding and removing page table entries in response to page
>   faults to keep the kernel heap size down.

How does Linux do this?

As I keep saying, you’ll need to look at seL4 IPC (or PPC as I prefer to call 
it) as the seL4 equivalent of a Linux system call: the mechanism to switch 
protection state.

In Linux you don’t call into the kernel on a different core, you change 
protection state on you own core (and the kernel may chose to hand off to a 
worker thread somewhere else). In that model it’s clear that you invoke the 
server on the same core. Depending on design, this may be invoking a shared, 
passive server locally, which hand off to a worker. Or you invoke a per-core 
server (meaning there’s a server thread and endpoint per core (or group of 
closely coupled core). This is system-specific policy.

Trying to do this all in the kernel just forces policy in the kernel, which is 
exactly what a microkernel should avoid.

> 2. Systems where the syscall overhead is too high because of speculative
>   execution mitigations.  On x86, each context switch between userspace
>   processes requires an IBPB, but entering the kernel often doesn’t.

This is a non-sequitur. You need to take action for timing-channel prevention 
when you cross a security domain, Going into and out of the kernel doesn’t, and 
as such doesn’t require such action. The same is going into and out of a 
trusted server. Preventing the server from leaking through any shared state 
(architected or not) is not different from preventing the kernel from leaking 
through shared state.

>  I  suspect there is other hardware that tags various internal data structures
>   with the current execution privilege as well, and requires costly flushes
>   when switching between mutually distrusting code running at the same
>   CPU privilege level.  The trusted server requires 2 such switches, whereas
>   kernel extensions only need 1.

The typical fallacy of trying to solve securty problems “transparently” in 
hardare, which always forces doing things unnecessarily.

Having said that, the argument that syscalls/IPC overhead limits performance 
doesn’t become more true by repeating it (it’s seeing a renaissance in the 
scientific literature too). I’m yet to see a realistic use case where seL4 IPC 
cost is performance limiting, except with poor user-level design.

Gernot

_______________________________________________
Devel mailing list -- devel@sel4.systems
To unsubscribe send an email to devel-leave@sel4.systems

Reply via email to