On Mon, Apr 13, 2020, 7:42 AM Andrew Warkentin <andreww...@gmail.com> wrote:

> On 4/12/20, Heiser, Gernot (Data61, Kensington NSW)
> <gernot.hei...@data61.csiro.au> wrote:
> >
> > Sure, OS structure matters a lot, and I’m certainly known for be telling
> > people consistently that IPC payloads of more than a few words is a
> strong
> > indicator of a poor design. Microkernel IPC should be considered a
> protected
> > function call mechanism, and you shouldn’t pass more by-value data than
> you
> > would to a C function (see
> >
> https://microkerneldude.wordpress.com/2019/03/07/how-to-and-how-not-to-use-sel4-ipc/).
> My
> >
>
> I would think that an RPC layer with run-time marshaling of arguments
> as is used as the IPC transport layer on most microkernel OSes would
> add some overhead, even if it is using the underlying IPC layer
> properly, since it has to iterate over the list of arguments,
> determine the type of each, and copy it to a buffer, and the reverse
> happening on the receiving end. Passing around bulk
> unstructured/opaque data is quite common (e.g. for disk and network
> transfers), and an RPC-based transport layer adds unnecessary overhead
> and complexity to such use cases.
>
> I think a better transport layer design (for an L4-like kernel at
> least) would be one that maintains RPC-like call-return semantics, but
> exposes message registers and a shared memory buffer almost directly
> with the only extra additions being a message length, file offset, and
> type code (to indicate whether a message is a short register-only
> message, a long message in the buffer, or an error) rather than using
> marshaling. This is what I plan to do on UX/RT, which will have a
> Unix-like IPC transport layer API that provides new
> read()/write()-like functions that operate on message registers or the
> shared buffer rather than copying as the traditional versions do (the
> traditional versions will also still be present of course, implemented
> on top of the "raw" versions).
>
> RPC with marshaling could easily still be implemented on top of such a
> transport layer (for complex APIs that need marshaling) with basically
> no overhead compared to an RPC-based transport layer.
>
> >
> > However, microkernel OS structure is very much determined by what
> security
> > properties you want to achieve, in particular, which components to trust
> and
> > which not. Less trust generally means more overhead, and the cost of
> > crossing a protection domain boundary is the critical factor that
> determines
> > this overhead for a given design.
> >
> It seems to be quite common for microkernel OSes to vertically split
> subsystems that really represent single protection domains. A good
> example is a disk stack. For the most part, all layers of a disk stack
> are dealing with the same data, just at different levels, so splitting
> them into processes just adds unnecessary overhead in most cases.
> Typically, one disk server process per device should be good enough as
> far as security goes. The only cases where there is any benefit at all
> to splitting a disk stack vertically is on systems with multiple
> partitions or traditional LVMs that provide raw volumes, and sometimes
> also systems with disk encryption. On a system with a single partition
> or an integrated LVM/FS like ZFS and no disk encryption there is
> typically no benefit to splitting up the disk stack.
>
> For systems where keeping partitions/LVs separated is important, it
> should be possible to run separate "lower" and "upper" disk servers
> with the lower one containing the disk driver, partition driver, and
> LVM and the upper one containing the FS driver and disk encrpytion
> layer, but this should not be mandatory (this is what I plan to do on
> UX/RT).
>
> A disk stack architecture like that of Genode where the disk driver,
> partition driver, and FS driver are completely separate programs
> (rather than plugins that may be run in different configurations)
> forces overhead on all use cases even though that overhead often
> provides no security or error recovery benefit.
>
> > It seems that most microkernel OSes
> > follow the former model for some reason, and I'm not sure why.
> >
> > Which OSes? I’d prefer specific data points over vague claims.
> >
> In addition to Genode, prime examples would include Minix 3 and Fuchsia.
>
> QNX seems to be the main example of a microkernel OS that uses a
> minimally structured IPC transport layer (although still somewhat more
> structured than what UX/RT will have) and goes out of its way to avoid
> intermediary servers (its VFS doesn't act as an intermediary on read()
> or write(), and many subsystems are single processes). One paper back
> in the 90s benchmarked an old version as being significantly faster
> than contemporary System V/386 on the same hardware for most of the
> APIs tested (although maybe that just means System V/386 was slow; I
> should try to see if I can get similar results with later versions of
> QNX against BSD and Linux).
>

Speaking of UX/RT: I strongly suggest avoiding PID-based APIs in favor of
handle-based APIs, and allocating file descriptors in a way that is more
efficient than always picking the lowest free one.  The first causes many
race conditions, and the second causes a lot of synchronization overhead.
io_uring is also a good API to consider.

>
_______________________________________________
Devel mailing list
Devel@sel4.systems
https://sel4.systems/lists/listinfo/devel

Reply via email to