[seL4] Re: On Common Infrastructure

Indan Zupancic Mon, 16 Dec 2024 04:50:28 -0800

Hello,

I just wanted to say that I agree with a lot of things that Wanja says
in this mail and that I think he makes some very good points about
common infrastructure requiring policy decisions.


On 2024-11-28 09:26, Wanja.Zaeske--- via Devel wrote:

While for the seL4 design having priority preemption may be essential,I see this as an OS design tradeoff. In particular looking through theglasses of safety certification, non-preemptive yet priority drivenscheduling certainly has some appeal compared to priority preemptivescheduling. Don't get me wrong, I just want to point out that prioritypreemption is neither the only nor necessarily the best design,especially for safety (where having no preemption greatly simplifiesreasoning while downsides like reduced performance and increasedlatency may be completely acceptable). I won't try to coerce anyoneinto building non-preemptive seL4, so I stop the topic here to notderail this conversation :D.

In seL4 that is a matter of configuration: Just use a period equal toyourwatchdog timeout value and you will practically have non-preemtivescheduling.And instead of a global watchdog, with seL4 you can use timeout faultsfor

more fine grained control. Or give all tasks a different priority.

One can create a system that implements the priorities as per therecommendations in the sDDF yet fails to consume its input fully, i.e.through setting the client PDs budget smaller than required to consumeone chunk of input. Priorities (as you pointed out), but alsoscheduling parameters (budget/period of partitions), external factors(rate at which data is coming in), and internal factors (system load,i.e. contention on the memory bus) all come to play for whether inputis fully consumed or not. For the serial-port this hopefully is not aconcern on any HW from this century, but there is of course other,higher bandwidth comms that are desirable. It is also this kind of"What if?" and "Which PD can adversely effect which other PD, and how?"that are relevant for building safety critical systems.


Yes, system design is hard and you need to know a lot of scheduling and
other seL4 details to make safety critical systems. But documenting all
intricacies of flow control, fault detection, buffering considerations
etc. is neither the task of the Microkit documentation, nor the seL4
manual. sDDF should document at least some of it though.

It would be nice if there was more general awareness of safety critical

system requirements by people making infrastructure like Microkit andsDDF,

basic things like fault recovery are currently missing. Especially in a

microkernel system, faults should not propagate through the system toothercomponents, so all infrastructure should be designed with faultcontainment

and recovery in mind.

Secondly, being completely policy-free is a double-sided sword. Ingeneral, I support the effort. However, if to use a more sophisticatedhardware I would need to make dozens of non-trivial design decisions(as the upstream implementation is completely un-opinionated/policyfree), this will hurt real-world adoption (especially if there is analternative that just has "the vanilla way of doing things" workwithout any decision paralysis). I think being 100% policy-free canlead to an SW landscape that is universally capable, but for whichthere exist no ecosystem of reusable components, because each and everycomponent is very individually designed for its specific use-case. Ithink Microkit is much more opinionated than seL4, and I think that iswhat makes it more useful to me. Being opinionated (albeit resulting intechnical compromise more easily) is the way to grow an ecosystem ofreusable components, and that is precisely the currency with which onecan foster greater adoption. Takeaway: policy-free (and simple tooperate!) drivers are great, but from time to time one should not defer_all_ decisions to the user, but set an opinionated precedence for_some_ of them. Also, with seL4 you already offer a product heavilyleaning towards "no technical compromise, but all the complexity andtightest possible coupling".


Strongly agreed with this.

One big problem is that having a lot of infrastructure that is not build
with a certain use case in mind quickly becomes useless, especially with
strong safety or security requirements.

Except for CPU usage and priorities, seL4 has all the required mechanism
to create modular, stand-alone reusable components. However, to let them
work together and foster re-use, a common system API/ABI is needed and a

way to tie everything together. If the infrastructure is good, peopleonly

need to replace some modules if they don't match requirements instead of
writing all infrastructure themselves.

I think it's necessary to split this into a system API and a system ABI:

An API makes it possible to avoid making system design decisions tooearly.


e.g. I'm not a fan of sDDF requiring multiple pointless PD transitions:
It's fine to have separation of concern, but if both components have the
same security and safety impact, then running them in different PDs is
just pointless overhead.

Same for time servers: Whether getting the time needs an IPC call orjust

a CPU instruction, with possibly getting extra info from shared memory,
should not be decided beforehand.

But having well defined common infrastructure is crucial. sDDF is a good
building block for communication, not only for drivers, but for system

services too. But it does not solve the API/ABI problem: That needs tobedefined per driver framework and system service. The challenge is tocreate

an API that is simple for simple systems, but has the flexibility to be
expanded with more complex functionality without breaking all existing
users. This is especially true for networking and file systems.

Microkit should expose all untyped memory and other resources it doesn't
use itself, so that it's easy to create a more dynamic system on top of

Microkit and launch stand-alone subsystems with a well-definedinterface.Practically such subsystem would look almost the same as a PD toMicrokit,but with UT and other additional resources (such subsystem could bebuild

using Microkit itself). If this is not supported, then using Microkit or
not becomes an all-or-nothing choice, reducing adoption. If you know you
can always get to the low-level seL4 mechanisms and add missing bits
yourself on top of Microkit, the choice to use it becomes much safer.

To some extent Lions OS (1) and SMOS (2) are a step in the rightdirection,

but they have a very long way to go.

Greetings,

Indan

1) https://lionsos.org
2) https://trustworthy.systems/projects/smos
_______________________________________________
Devel mailing list -- devel@sel4.systems
To unsubscribe send an email to devel-leave@sel4.systems

[seL4] Re: On Common Infrastructure

Reply via email to