Hello,
I just wanted to say that I agree with a lot of things that Wanja says
in this mail and that I think he makes some very good points about
common infrastructure requiring policy decisions.
On 2024-11-28 09:26, Wanja.Zaeske--- via Devel wrote:
While for the seL4 design having priority preemption may be essential,
I see this as an OS design tradeoff. In particular looking through the
glasses of safety certification, non-preemptive yet priority driven
scheduling certainly has some appeal compared to priority preemptive
scheduling. Don't get me wrong, I just want to point out that priority
preemption is neither the only nor necessarily the best design,
especially for safety (where having no preemption greatly simplifies
reasoning while downsides like reduced performance and increased
latency may be completely acceptable). I won't try to coerce anyone
into building non-preemptive seL4, so I stop the topic here to not
derail this conversation :D.
In seL4 that is a matter of configuration: Just use a period equal to
your
watchdog timeout value and you will practically have non-preemtive
scheduling.
And instead of a global watchdog, with seL4 you can use timeout faults
for
more fine grained control. Or give all tasks a different priority.
One can create a system that implements the priorities as per the
recommendations in the sDDF yet fails to consume its input fully, i.e.
through setting the client PDs budget smaller than required to consume
one chunk of input. Priorities (as you pointed out), but also
scheduling parameters (budget/period of partitions), external factors
(rate at which data is coming in), and internal factors (system load,
i.e. contention on the memory bus) all come to play for whether input
is fully consumed or not. For the serial-port this hopefully is not a
concern on any HW from this century, but there is of course other,
higher bandwidth comms that are desirable. It is also this kind of
"What if?" and "Which PD can adversely effect which other PD, and how?"
that are relevant for building safety critical systems.
Yes, system design is hard and you need to know a lot of scheduling and
other seL4 details to make safety critical systems. But documenting all
intricacies of flow control, fault detection, buffering considerations
etc. is neither the task of the Microkit documentation, nor the seL4
manual. sDDF should document at least some of it though.
It would be nice if there was more general awareness of safety critical
system requirements by people making infrastructure like Microkit and
sDDF,
basic things like fault recovery are currently missing. Especially in a
microkernel system, faults should not propagate through the system to
other
components, so all infrastructure should be designed with fault
containment
and recovery in mind.
Secondly, being completely policy-free is a double-sided sword. In
general, I support the effort. However, if to use a more sophisticated
hardware I would need to make dozens of non-trivial design decisions
(as the upstream implementation is completely un-opinionated/policy
free), this will hurt real-world adoption (especially if there is an
alternative that just has "the vanilla way of doing things" work
without any decision paralysis). I think being 100% policy-free can
lead to an SW landscape that is universally capable, but for which
there exist no ecosystem of reusable components, because each and every
component is very individually designed for its specific use-case. I
think Microkit is much more opinionated than seL4, and I think that is
what makes it more useful to me. Being opinionated (albeit resulting in
technical compromise more easily) is the way to grow an ecosystem of
reusable components, and that is precisely the currency with which one
can foster greater adoption. Takeaway: policy-free (and simple to
operate!) drivers are great, but from time to time one should not defer
_all_ decisions to the user, but set an opinionated precedence for
_some_ of them. Also, with seL4 you already offer a product heavily
leaning towards "no technical compromise, but all the complexity and
tightest possible coupling".
Strongly agreed with this.
One big problem is that having a lot of infrastructure that is not build
with a certain use case in mind quickly becomes useless, especially with
strong safety or security requirements.
Except for CPU usage and priorities, seL4 has all the required mechanism
to create modular, stand-alone reusable components. However, to let them
work together and foster re-use, a common system API/ABI is needed and a
way to tie everything together. If the infrastructure is good, people
only
need to replace some modules if they don't match requirements instead of
writing all infrastructure themselves.
I think it's necessary to split this into a system API and a system ABI:
An API makes it possible to avoid making system design decisions too
early.
e.g. I'm not a fan of sDDF requiring multiple pointless PD transitions:
It's fine to have separation of concern, but if both components have the
same security and safety impact, then running them in different PDs is
just pointless overhead.
Same for time servers: Whether getting the time needs an IPC call or
just
a CPU instruction, with possibly getting extra info from shared memory,
should not be decided beforehand.
But having well defined common infrastructure is crucial. sDDF is a good
building block for communication, not only for drivers, but for system
services too. But it does not solve the API/ABI problem: That needs to
be
defined per driver framework and system service. The challenge is to
create
an API that is simple for simple systems, but has the flexibility to be
expanded with more complex functionality without breaking all existing
users. This is especially true for networking and file systems.
Microkit should expose all untyped memory and other resources it doesn't
use itself, so that it's easy to create a more dynamic system on top of
Microkit and launch stand-alone subsystems with a well-defined
interface.
Practically such subsystem would look almost the same as a PD to
Microkit,
but with UT and other additional resources (such subsystem could be
build
using Microkit itself). If this is not supported, then using Microkit or
not becomes an all-or-nothing choice, reducing adoption. If you know you
can always get to the low-level seL4 mechanisms and add missing bits
yourself on top of Microkit, the choice to use it becomes much safer.
To some extent Lions OS (1) and SMOS (2) are a step in the right
direction,
but they have a very long way to go.
Greetings,
Indan
1) https://lionsos.org
2) https://trustworthy.systems/projects/smos
_______________________________________________
Devel mailing list -- devel@sel4.systems
To unsubscribe send an email to devel-leave@sel4.systems