juli pushed a commit to branch wip-goblinsify in repository shepherd. commit 4df21b6d7838d3bb1374b73b974ecc72b6066f3e Author: Juliana Sims <j...@incana.org> AuthorDate: Fri Oct 11 20:03:01 2024 -0400
Add design doc. * goblins-port-design-doc.org: New file. --- goblins-port-design-doc.org | 271 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 271 insertions(+) diff --git a/goblins-port-design-doc.org b/goblins-port-design-doc.org new file mode 100644 index 0000000..4d6f884 --- /dev/null +++ b/goblins-port-design-doc.org @@ -0,0 +1,271 @@ +* Overview + +This purpose of this project is to rewrite [[https://www.gnu.org/software/shepherd/][the GNU Shepherd]] using the [[https://spritely.institute/goblins/][Spritely +Goblins]] object-capability security (ocaps) library. This rewrite is possible +because both Goblins and the Shepherd are written in the [[https://www.gnu.org/software/guile/][Guile]] dialect of the +[[https://www.scheme.org/][Scheme]] programming language. The upshot of this rewrite will be both increased +interprocess security and new abilities for intercommunication between Shepherd +dæmons running on separate machines. + +The general architecture of the Shepherd will change the least amount possible. +Some code which Goblins makes redundant will be removed; new code required for +networked communication (mostly around handling remote connections and creating +facets etc.) will be added. The central design restriction is that the +user-facing API will remain backwards-compatible. That is, existing Shepherd +service definitions will still be valid, and new versions of the ~herd~ CLI will +be able to communicate with older Shepherd dæmons -- and vice-versa. + +A thorough explanation of the object-capability security paradigm is beyond the +scope of this document. You can read about the paradigm as implemented by +Goblins in Spritely's [[https://spritely.institute/static/papers/spritely-core.html]["Heart of Spritely" whitepaper]]. However, this document +does include an appendix explaining core ocaps ideas and idioms relevant to +understanding the present discussion. + +* Design Description + +The Shepherd's current architecture already takes inspiration from the actor +model. As such, most of the work to be done is translation from the extant form +of actors to the Goblins form of actors. In general, Shepherd actors are +represented by a record type to hold state, a core procedure reading messages +from a [[https://github.com/wingo/fibers][Fibers]] [[https://github.com/wingo/fibers/wiki/Manual#23-channels][channel]] in a loop, and procedures to send the appropriate +messages. Sometimes the record type is unneeded and the core loop procedure +manages all state internally. In this rewrite, we typically represent Goblins +actors with a combination of the [[https://spritely.institute/files/docs/guile-goblins/0.14.0/define_002dactor.html][~define-actor~]] macro and the [[https://spritely.institute/files/docs/guile-goblins/0.14.0/Methods.html][~methods~]] macro. +Immutable state is left in the constructor's closure to be referenced as though +a procedure argument while mutable state is stored in a [[https://spritely.institute/files/docs/guile-goblins/0.14.0/Cell.html][cell]]. + +What follows is a description of the expected major changes this port will bring +about. It does not aim to be exhaustive, and it does not aim to be unyielding. +The descriptions of the current state below are to help ensure understanding +while the description of projected work is subject to substantial change. As a +final caveat, this document does not endeavor to discuss every change that will +be part of this port but rather focuses on the major components which will see +significant change. + +** Service and Service Controller Actor + +*** Current Status + +The Shepherd's central datatype is the service actor which represents a Shepherd +service and its relevant state. It is exposed to users in the ~service~ +procedure which handles construction. There is also a related actor, the +service controller, which serves as an intermediary for messages to the service +actor to avoid potential deadlocks in the core Shepherd fiber. + +Services advertise their role and functionality using their ~provision~ list. +This is a list of symbols, the first of which becomes the actor's canonical +name. Services may rely upon other services, represented by the complimentary +~requires~ list. This is also a list of symbols, corresponding to symbols in +the ~provision~ list of some other actor or actors. The service registry +handles dependency resolution; see the service registry for further discussion +of dependency resolution. + +*** Rewrite Goals + +The Goblins port will rewrite the service actor as a Goblins actor. The +~service~ procedure will be preserved as a wrapper around actor construction. +The service controller will be removed and its functionality split between the +new service actor and the service registry as appropriate. + +** Service Registry + +*** Current State + +The service registry actor keeps track of all services known to a given Shepherd +dæmon as well as exposing them as needed. It is not exposed to users, but it is +given references to service actors in ~register-services~. It is also +responsible for starting new services during registration. + +*** Rewrite Goals + +The core service registry will be ported relatively directly. The main +difference is that the registry will need to pass a facet of itself to service +actors so that they may communicate their current state. Similarly, support for +remote intercommunication between Shepherd dæmons will require the creation of a +service registry facet which only exposes certain service actors. This same +facet can be used locally to allow multiple users to share the same Shepherd +dæmon. This will likely involve the creation of new methods for handshaking +between registries and, by extension, Shepherd dæmons. + +The exact interrelation between local and remote registry remains to be +explored. However, the general idea is that a local registry will check if it +is able to respond successfully to a message. If it is not, it will ask remote +registries it knows about if they can respond to the message and forward their +response. This may use [[https://spritely.institute/files/docs/guile-goblins/0.14.0/Pubsub.html][pubsub]] or some other mechanism, such as querying a +registry to learn what capabilities it provides. + +One new mechanism of interest may be the ability to register remote services. +This could take two forms: placing a reference to an extant remote service in +the local registry, and asking a local registry to register and launch a service +on a remote host. This exact API remains to be explored based on need. + +*** Open Questions + +The service registry is the central point of control at the time when +dependencies are resolved. It is unclear how remote versus local dependencies +should be resolved. If a service asks for ~'sql~, for example, there are +multiple services known to a registry to provide ~'sql~, which one should be +chosen? + +There are a few options here. We could introduce new syntax to specify that a +requirement must be local or must be remote, or to specify a specific service +actor to fulfill a requirement (eg with a sturdyref). We could specify +different kinds of requirements and provisions, indicating whether a given +provision is interchangeable or not (eg ~'sql~ would likely require a specific +~'sql~ database, but ~'tor~ may be able to route external traffic through any +~'tor~ dæmon on the local network). More exploration and input is needed. + +** Starting and Monitoring Processes + +*** Current State + +Services are started by users in configuration files using ~start-service~ or +~start-in-the-background~. Starting a service usually implies starting an +operating system process that will execute and exit if it is a one-shot service, +or will be registered with the ~process-monitor~ actor which listens for signals +from associated processes. The process monitor is a procedure composed of a +core loop that listens for messages. Internally, it maintains a list of +~waiters~ which are process IDs and Fibers channels to inform when a process +terminates. + +*** Rewrite Goals + +Starting services will look very similar to how they work now, with the primary +changes being around spawning Goblins actors as opposed to Shepherd actors. The +process monitor will become a Goblins actor. ~waiters~ can be replaced using +the pubsub idiom, and it may be possible to express most of the process monitor +in terms of pubsub. + +** Commandline Interface + +*** Current State + +The primary interface to a running Shepherd dæmon is the ~herd~ command. This +script handles and forwards certain messages to services using s-expressions to +communicate with the Shepherd dæmon proper. As part of this interface, the herd +has the concept of a live service, which it represents with a Shepherd-style +actor. + +*** Rewrite Goals + +The live service actor will be translated to a Goblins actor. Otherwise, the +primary change to herd will be to route messages through a proxy of the Shepherd +interface which is only aware of the services the user running the script has +created. This will take the place of separate Shepherd dæmons for each user. +There will also be new arguments and commands necessary to manage remote +communication, though most of the handshaking and capability transfer process +will be handled in configuration code. + +* Closed Questions + +This section covers problems which have been resolved or which only seemed to be +problems. + +** Transactionality and Memory + +Goblins transactionality compresses its ~transactormap~ objects to the same size +they would be even without the transactionality. Only the debugger causes +~bcom~ to grow memory. + +** Code Inversion + +Object-capability security inverts many ACL concepts of control flow and +authority as part of its security mechanisms. This can lead to unfamiliar code +and imitate the situation known as "callback hell." To mitigate this issue, the +~let-on~ macro exists, allowing tradition-looking Scheme code that expands to +ocaps-style ~on~. + +* Open Questions + +This section covers problems which have not been resolved or which require +further consideration. + +** Inbox Overflow + +Goblins message-passing could theoretically result in a situation where a given +vat is unable to accept more messages, but remote actors don't know this -- are +maliciously take advantage of this as part of an attack. While this situation +has never been encountered in practice, it is a subject of active consideration +within Spritely and the broader [[https://ocapn.org/][OCapN]] community. + +* Ocaps Appendix + +Goblins, as an implementation of the object-capability security paradigm, has a +long history and rich heritage of [[http://wiki.erights.org/wiki/Walnut/Secure_Distributed_Computing/Capability_Patterns][design patterns and idioms]], some of which are +provided directly in the core implementation and some of which are provided in +its [[https://spritely.institute/files/docs/guile-goblins/0.14.0/actor_002dlib.html][~actor-lib~]] library. Here we discuss some core ocaps ideas to better +understand the rest of this document. We assume basic familiarity with the +Scheme programming language; see [[https://spritely.institute/static/papers/scheme-primer.html][A Scheme Primer]] for this if needed. + +** Actors + +The central idiom of ocaps is the object. These objects are inspired by the +[[https://en.wikipedia.org/wiki/Actor_model][actor model]] so they are frequently called "actors" in Goblins to avoid confusion +with other kinds of objects. Fundamentally, actors receive messages, +represented in Goblins by methods. In response to messages, an actor can only +send a new message, create a new actor, or update its internal state. By +restricting access to actors to other actors with capabilities on it, certain +security guarantees can be made. + +** Capabilities + +In ocaps, a capability is simply a reference to an actor. On a local machine, +this takes the shape of a standard object reference in the host language -- +usually a pointer. Remote actors are usually represented by a local actor which +simply forwards messages to the actor itself. An actor can only receive +capabilities at creation, by creating the actor associated with the capability, +or by being given a capability in a message from another actor which already has +the capability. + +** Messages and Methods + +Strictly speaking, Scheme function application is message-passing in Goblins. +However, idiomatic Goblins relies heavily on the [[https://spritely.institute/files/docs/guile-goblins/0.14.0/Methods.html][~methods~]] macro to allow actors +to accept multiple different messages. This macro produces a lambda accepting +as its first argument a method name, which is itself a symbol. These methods +may then accept arguments as would any regular lambda. + +** Asynchronous Message Passing and Promises + +Unlike the actor model, Goblins provide mechanisms for both [[https://spritely.institute/files/docs/guile-goblins/0.14.0/Synchronous-calls.html][synchronous message +passing]] represented by the ~$~ operator, which acts like normal function +invocation in that it executes immediately and returns a normal value; and +[[https://spritely.institute/files/docs/guile-goblins/0.14.0/Asynchronous-calls.html][asynchronous message passing]] represented by the ~<-~, which returns a promise +(ocaps promises inspired [[https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise][JavaScript promises]]) that can be fulfilled or broken. +If fulfilled, a promise will hold a normal value that can be accessed using +specific mechanisms, centrally [[https://spritely.institute/files/docs/guile-goblins/0.14.0/Asynchronous-calls.html#index-on][~on~]]; if broken, a promise will error out. + +*** Vats + +The core innovation which allows ocaps to have both synchronous and asynchronous +invocation is the vat. A vat is essentially an event loop which passes messages +to and between actors spawned "in" that vat. Actors can use ~$~ for any actor +in the same vat, and must use ~<-~ for all other actors -- though they may use +~<-~ for actors in the same vat as well. While individual actors only know +about other individual actors, the vat abstracts and centralizes message passing +for a group of actors, meaning that messages to a remote actor are actually sent +to its vat. As a corollary, vats operate in only one operating system process +or thread -- concurrency is achieved by communication between vats. + +*** Asynchronous Idioms + +Unlike JavaScript, ocaps has a concept of "promise pipelining" which allows +local code to specify messages it would like to send in response to the +resolution of a promise alongside the initial message. This allows the creation +of several idioms to cut down on so-called "callback hell," all built upon ~on~. +The most common central of these in Goblins is [[https://spritely.institute/files/docs/guile-goblins/0.14.0/Let_002dOn.html][~let-on~]]. This macro allows +users to write code that looks like regular Scheme ~let~ statements but which +expands to use ~on~. Relatedly, a variety of [[https://spritely.institute/files/docs/guile-goblins/0.14.0/Joiners.html][joiner]] macros facilitate promise +pipelining across multiple actors. + +** Cells + +[[https://spritely.institute/files/docs/guile-goblins/0.14.0/Cell.html][Cells]] are a state isolation tool inherited directly from the original actor +model literature. They are simply actors which hold a value. In response to a +message, they either return or update their value. + +** Proxies and Facets + +Proxies are actors which represent other actors. Remote actors, for example, +are represented by local proxies. [[https://spritely.institute/files/docs/guile-goblins/0.14.0/Facet.html][Facets]] are special proxies which limit which +messages are passed to the proxied actor.