On 3/8/25 4:49 PM, Gernot Heiser via Devel wrote:
> On 8 Mar 2025, at 22:02, Andrew Warkentin via Devel <devel@sel4.systems> 
> wrote:
>>
>> On Sat, Mar 8, 2025 at 3:12 AM Gernot Heiser via Devel <devel@sel4.systems>
>> wrote:
>>
>>>
>>> I don’t see a reason for providing bulk data copying as a kernel service
>>> (despite Liedtke doing this in the original L4, in violation of his own
>>> minimality principle). It can (and therefore should) be provided as a
>>> user-level service, which is why we dropped it in seL4. And, done properly,
>>> it’ll easily outperform QNX’s in-kernel implementation.
>>>
>>
>> I can't come up with a way for such a service to determine the server's
>> address space short of tying endpoints to server address spaces, which is
>> something I want to avoid, since I want the option of handing off
>> connections to threads/processes with different address spaces. The other
>> issue is with mapping the source and destination address spaces; the most
>> "straightforward" ways I can think of to do this would either be to map
>> both source and destination buffers on every IPC, which would be extremely
>> inefficient (especially with the requirement to have a separate capability
>> for each VSpace a page or table is mapped into, meaning one system call
>> would become potentially hundreds or more depending on the buffer sizes),
>> or have an identity mapping of all of physical memory with user-level
>> shadow page tables to look up each page before copying it, which could also
>> be somewhat slower than what could be done in the kernel, where you could
>> have table-sized address space windows reserved for source and destination
>> address spaces, with only two table updates required if both buffers lie
>> within a single table's address space. I did actually end up using the
>> kernel's physical address space window and software walking of page tables
>> in my initial in-kernel implementation since it might have ended up being
>> slightly easier to do initially, but at some point I want to replace it
>> with a table-window-based setup.
> 
> Why so complicated?
> 
> The equivalence of putting the functionality in the kernel, where the kernel 
> has access to all user state, is to have a trusted server which has access to 
> all client state (especially CSpaces). The TCB is the same, but you don’t 
> break verification and retain the option to verify the trusted server. This 
> approach will work for all designs.
> 
> In a specific system design you’re likely have opportunities to 
> simplify/modularise/reduce the power of the server, making verification 
> easier. But even the model of the all-powerful server is better (more 
> assured) than the model of hacking the kernel and completely abandoning 
> verification.

There are a couple classes of designs I am not sure how to implement 
efficiently in this model:

1. Systems that run on many loosely coupled cores where statically partitioning
   the kernel heap requires either reserving too much memory at startup, or 
playing
   silly tricks like adding and removing page table entries in response to page
   faults to keep the kernel heap size down.

2. Systems where the syscall overhead is too high because of speculative
   execution mitigations.  On x86, each context switch between userspace
   processes requires an IBPB, but entering the kernel often doesn’t.  I
   suspect there is other hardware that tags various internal data structures
   with the current execution privilege as well, and requires costly flushes
   when switching between mutually distrusting code running at the same
   CPU privilege level.  The trusted server requires 2 such switches, whereas
   kernel extensions only need 1.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
_______________________________________________
Devel mailing list -- devel@sel4.systems
To unsubscribe send an email to devel-leave@sel4.systems

Reply via email to