On Mon, Apr 13, 2020, 7:42 AM Andrew Warkentin <andreww...@gmail.com> wrote:
> On 4/12/20, Heiser, Gernot (Data61, Kensington NSW) > <gernot.hei...@data61.csiro.au> wrote: > > > > Sure, OS structure matters a lot, and I’m certainly known for be telling > > people consistently that IPC payloads of more than a few words is a > strong > > indicator of a poor design. Microkernel IPC should be considered a > protected > > function call mechanism, and you shouldn’t pass more by-value data than > you > > would to a C function (see > > > https://microkerneldude.wordpress.com/2019/03/07/how-to-and-how-not-to-use-sel4-ipc/). > My > > > > I would think that an RPC layer with run-time marshaling of arguments > as is used as the IPC transport layer on most microkernel OSes would > add some overhead, even if it is using the underlying IPC layer > properly, since it has to iterate over the list of arguments, > determine the type of each, and copy it to a buffer, and the reverse > happening on the receiving end. Passing around bulk > unstructured/opaque data is quite common (e.g. for disk and network > transfers), and an RPC-based transport layer adds unnecessary overhead > and complexity to such use cases. > > I think a better transport layer design (for an L4-like kernel at > least) would be one that maintains RPC-like call-return semantics, but > exposes message registers and a shared memory buffer almost directly > with the only extra additions being a message length, file offset, and > type code (to indicate whether a message is a short register-only > message, a long message in the buffer, or an error) rather than using > marshaling. This is what I plan to do on UX/RT, which will have a > Unix-like IPC transport layer API that provides new > read()/write()-like functions that operate on message registers or the > shared buffer rather than copying as the traditional versions do (the > traditional versions will also still be present of course, implemented > on top of the "raw" versions). > > RPC with marshaling could easily still be implemented on top of such a > transport layer (for complex APIs that need marshaling) with basically > no overhead compared to an RPC-based transport layer. > > > > > However, microkernel OS structure is very much determined by what > security > > properties you want to achieve, in particular, which components to trust > and > > which not. Less trust generally means more overhead, and the cost of > > crossing a protection domain boundary is the critical factor that > determines > > this overhead for a given design. > > > It seems to be quite common for microkernel OSes to vertically split > subsystems that really represent single protection domains. A good > example is a disk stack. For the most part, all layers of a disk stack > are dealing with the same data, just at different levels, so splitting > them into processes just adds unnecessary overhead in most cases. > Typically, one disk server process per device should be good enough as > far as security goes. The only cases where there is any benefit at all > to splitting a disk stack vertically is on systems with multiple > partitions or traditional LVMs that provide raw volumes, and sometimes > also systems with disk encryption. On a system with a single partition > or an integrated LVM/FS like ZFS and no disk encryption there is > typically no benefit to splitting up the disk stack. > > For systems where keeping partitions/LVs separated is important, it > should be possible to run separate "lower" and "upper" disk servers > with the lower one containing the disk driver, partition driver, and > LVM and the upper one containing the FS driver and disk encrpytion > layer, but this should not be mandatory (this is what I plan to do on > UX/RT). > > A disk stack architecture like that of Genode where the disk driver, > partition driver, and FS driver are completely separate programs > (rather than plugins that may be run in different configurations) > forces overhead on all use cases even though that overhead often > provides no security or error recovery benefit. > > > It seems that most microkernel OSes > > follow the former model for some reason, and I'm not sure why. > > > > Which OSes? I’d prefer specific data points over vague claims. > > > In addition to Genode, prime examples would include Minix 3 and Fuchsia. > > QNX seems to be the main example of a microkernel OS that uses a > minimally structured IPC transport layer (although still somewhat more > structured than what UX/RT will have) and goes out of its way to avoid > intermediary servers (its VFS doesn't act as an intermediary on read() > or write(), and many subsystems are single processes). One paper back > in the 90s benchmarked an old version as being significantly faster > than contemporary System V/386 on the same hardware for most of the > APIs tested (although maybe that just means System V/386 was slow; I > should try to see if I can get similar results with later versions of > QNX against BSD and Linux). > Speaking of UX/RT: I strongly suggest avoiding PID-based APIs in favor of handle-based APIs, and allocating file descriptors in a way that is more efficient than always picking the lowest free one. The first causes many race conditions, and the second causes a lot of synchronization overhead. io_uring is also a good API to consider. > _______________________________________________ Devel mailing list Devel@sel4.systems https://sel4.systems/lists/listinfo/devel