Add the document describing a new design for the OS installation process for
new instances.
Signed-off-by: Michele Tartara <[email protected]>
Signed-off-by: Jose A. Lopes <[email protected]>
---
doc/design-draft.rst | 1 +
doc/design-os.rst | 399 ++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 400 insertions(+)
create mode 100644 doc/design-os.rst
diff --git a/doc/design-draft.rst b/doc/design-draft.rst
index c821292..3ed3852 100644
--- a/doc/design-draft.rst
+++ b/doc/design-draft.rst
@@ -20,6 +20,7 @@ Design document drafts
design-daemons.rst
design-hsqueeze.rst
design-ssh-ports.rst
+ design-os.rst
.. vim: set textwidth=72 :
.. Local Variables:
diff --git a/doc/design-os.rst b/doc/design-os.rst
new file mode 100644
index 0000000..5714efc
--- /dev/null
+++ b/doc/design-os.rst
@@ -0,0 +1,399 @@
+===============================
+Ganeti OS installation redesign
+===============================
+
+.. contents:: :depth: 3
+
+This is a design document detailing a new OS installation procedure, which is
+more secure, able to provide more features and easier to use for many common
+tasks w.r.t. the current one.
+
+Current state and shortcomings
+==============================
+
+As of Ganeti 2.10, each instance is associated with an OS definition. An OS
+definition is a set of scripts (``create``, ``export``, ``import``, ``rename``)
+that are executed with root privileges on the primary host of the instance to
+perform all the OS-related functionality (setting up an operating system inside
+the disks of the instance being created, exporting/importing the instance,
+renaming it).
+
+These scripts receive, as environment variables, a fixed set of parameters
+related to the instance (such as the hypervisor, the name of the instance, the
+number of disks, and their location) and a set of user defined parameters.
+These parameters are also written in the configuration file of Ganeti, to allow
+future reinstalls of the instance, and in various log files, namely:
+
+* node daemon log file: contains DEBUG strings of the ``/os_validate``,
+ ``/instance_os_add`` and ``/instance_start`` RPC calls.
+
+* master daemon log file: DEBUG strings related to the same RPC calls are
stored
+ here as well.
+
+* commands log: the CLI commands that create a new instance, including their
+ parameters, are logged here.
+
+* RAPI log: the RAPI commands that create a new instance, including their
+ parameters, are logged here.
+
+* job logs: the job files stored in the job queue, or in its archive, contain
+ the parameters.
+
+The current situation presents a number of shortcomings:
+
+* Having the installation scripts run as root on the nodes doesn't allow
+ user-defined OS scripts, as they would pose a huge security issue.
+ Furthermore, even a script without malicious intentions might end up
+ distrupting a node because of a bug in it.
+
+* Ganeti cannot be used to create instances starting from user provided disk
+ images: even in the (hypothetical) case where the scripts are completely
+ secure and run not by root but by an unprivileged user with only the power to
+ mount arbitrary files as disk images, this is a security issue. It has been
+ proven that a carefully crafted file system might exploit kernel
+ vulnerabilities to gain control of the system. Therefore, directly mounting
+ images on the Ganeti nodes is not an option.
+
+* There is no way to inject files into an existing disk image. A common use
case
+ is for the system administrator to provide a standard image of the system, to
+ be later personalized with the network configuration, private keys
identifying
+ the machine, ssh keys of the users and so on. A possible workaround would be
+ for the scripts to mount the image (only if this is trusted!) and to receive
+ the configurations and ssh keys as user defined OS parameters. Unfortunately,
+ this is also not an option for security sensitive material (such as the ssh
+ keys) because the OS parameters are stored in many places on the system, as
+ already described above.
+
+* Most other virtualization software simply work with instance images, not with
+ installation scripts. This difference makes the interaction of Ganeti with
+ other software difficult.
+
+Proposed changes
+================
+
+In order to fix the shortcomings of the current state, we plan to introduce the
+following changes:
+
+* Change the OS parameters to have three categories:
+
+ * ``public``: the current behavior. The parameter is logged and stored freely.
+
+ * ``private``: the parameter is saved inside the Ganeti configuration (to
allow
+ for instance reinstall) but it is not shown in logs, job logs, or passed
back
+ via RAPI.
+
+ * ``secret``: the parameter is not saved inside the Ganeti configuration.
+ Reinstalls are impossible unless the data is passed again. The parameter
will
+ not appear in any log file. When a functionality is performed jointly by
+ multiple daemons (such as MasterD and LuxiD), currently Ganeti sometimes
+ serializes jobs on disk and later reloads them. Secret parameters will not
be
+ serialized to disk. They will be passed around as part of the LUXI calls
+ exchanged by the daemons, and only kept in memory, in order to reduce their
+ accessibility as much as possible. In case of failure of the master node,
+ these parameters will be lost and cannot be recovered because they are not
+ serialized. As a result, the job cannot be taken over by the new master.
+ This is an expected and accepted side effect of jobs with secret parameters:
+ if they fail, they'll have to be restarted manually.
+
+* A new OS installation procedure, based on a safe virtualized environment.
+ This virtualized environment will run with the same hardware parameter as the
+ actual instance being installed, as much as possible. This will also allow to
+ reduce the memory usage in the host (specifically, in Dom0 for Xen
+ installations). Each instance will have these possible execution modes:
+
+ * ``run``: the default mode, used when the machine is running normally and
+ the OS installation procedure is run before starting the instance for the
+ first time.
+
+ * ``self_install``: the first run of the instance will be with a different
set
+ of parameters w.r.t. all the successive runs. This set of "install
+ parameters" will allow, e.g., to attach an installation
+ floppy/cdrom/network, change the boot device order, or specify an OS image
+ to be used. Through this set of parameters, the administrator will have to
+ provide the hypervisor a way to find an installation medium for the
instance
+ (e.g., a boot disk, a network image, etc). This medium will then install
the
+ instance itself on the disks and will then be responsible to get the
+ parameters for configuring it (its network interfaces, IP address,
hostname,
+ etc.) from a set of metadata provided by Ganeti (e.g.: using an approach
+ comparable to the one of the ``cloud-init`` tool). When this installation
+ mode is used, no OS installation script is required. In order for the
+ installation of an OS from an image to be possible, the ``--os-type``
+ parameter will be extended to support a new additional format: ``--os-type
+ image:<URL>`` will instruct Ganeti to take an image from the specified
+ position. For the initial implementation, URL can be either a filename or a
+ publically accessible HTTP or FTP resource. Once the instance image is
+ received, it will be dd-ed onto the first disk of the instance. When an
+ image is specified, ``--os-parameters`` can still be used, and its content
+ will be passed to the instance as part of the metadata. Note that, as part
+ of the OS scripts, there is a file specifying what parameters are
+ expected. With OS images, though, none of the traditional structure of OS
+ scripts is in place, so there will be no check regarding what parameters
can
+ be specified: they will all be passed, as long as the ``--os-parameters``
+ string is syntactically valid. The set of ``self_install`` parameters will
+ be stored as part of the instance configuration, so that they can be used
to
+ reinstall the instance. It will be the user's responsibility to ensure
that
+ the OS image or any installation media is still available in the proper
+ position when a reinstall happens. After the first run, the instance will
+ revert to ``run`` mode.
+
+ * ``install``: Ganeti will start the instance using a virtual appliance
+ specifically made for installing Ganeti instances. Scripts analogous to the
+ current ones will run inside this instance. The disks of the instance being
+ installed will be connected to this virtual appliance, so that the scripts
+ can mount them and modify them as needed, as currently happens, but with
the
+ additional protection given by this happening in a VM. The disk of the
+ virtual appliance will be read only, so that a pristine copy of the
+ appliance can be started every time a new instance needs to be created, to
+ further increase security. The data the instance needs to write at runtime
+ will only be stored in RAM, and disappear as soon as the instance is
+ stopped. Metadata will be provided also to this virtual applicance, that
+ will take care of converting them to environment variables for the
+ installation scripts. After the first run, the instance will revert to
+ ``run`` mode.
+
+* In order to allow for the metadata to be sent inside the instance, a
+ communication mechanism between the instance and the host will be created.
+ This mechanism will be bidirectional (e.g.: to allow the setup process going
+ on inside the instance to communicate its progress to the host). Each
instance
+ will have access exclusively to its own metadata, and it will be only able to
+ communicate with its host over this channel. More details will be provided in
+ the `Communication mechanism and metadata service`_ section.
+
+* As part of the instance creation command it will be possible to indicate a
URL
+ for a "personalization package", that is an archive containing a set of files
+ meant to be overlayed on top of the operating system file system at the end
of
+ the setup process, before the VM is started for the first time in ``run``
+ mode. Ganeti will provide a mechanism for receiving and unpacking this
+ archive as part of the ``install`` execution mode, whereas in
``self_install``
+ mode it will only be provided as a metadata for the instance to use. The
+ archive will be in TAR-GZIP format (with extension ``.tar.gz`` or ``.tgz``)
+ and will contain the files according to the directory structure that will be
+ recreated on the installation disk. Files contained in this archive will
+ overwrite files with the same path created during the install procedure (if
+ any). The URL of the "personalization package" will have to specify an
+ extesion to identify the file format (in order to allow for more formats to
be
+ supported in the future). The URL will be stored as part of the
configuration
+ of the instance (therefore, the URL should not contain confidential
+ information, but the files there available can). It is up to the system
+ administrator to ensure that a package is actually available at that URL at
+ install and reinstall time. The content of the package is allowed to change.
+ E.g.: a system administrator might create a package containing the private
+ keys of the instance being created. When the instance is reinstalled, a new
+ package with new keys can be made available there, therefore allowing
instance
+ reinstall without the need to store keys. Together with the URL, a username
+ and a password can be specified to. If the URL is a HTTP(S) URL, they will be
+ used as basic access authentication credentials to access that URL. The
+ username and password will not be saved in the config, and will have to be
+ provided again in case a reinstall is requested. The downloaded
+ personalization package will not be stored locally on the node for longer
than
+ it is needed while unpacking it and adding its files to the instance being
+ created. The personalization package will be overlayed on top of the
instance
+ filesystem after the scripts that created it have been executed. In order
for
+ the files in the package to be automatically overlayed on top of the instance
+ filesystem it is required that the appliance is actually able to mount the
+ instance disks, therefore this will not work for every filesystem.
+
+Implementation
+==============
+
+The implementation of this design will happen as an ordered sequence of steps,
+of increasing impact on the system and, in some cases, dependent on each other:
+
+#. Private and secret instance parameters
+#. Communication mechanism between host and instance
+#. Metadata service
+#. Personalization package (inside a virtualization environment)
+#. ``self_install`` mode
+#. ``install`` mode (inside a virtualization environment)
+
+Some of these steps need to be more deeply specified w.r.t. what is already
+written in the `Proposed changes`_ Section. Extra details will be provided in
+the following subsections.
+
+Communication mechanism and metadata service
+++++++++++++++++++++++++++++++++++++++++++++
+
+The communication mechanism and the metadata service are described together
+because they are deeply tied. On the other hand, the communication mechanism
+will need to be more generic because it can be used for other reasons in the
+future (like allowing instances to explicitly send commands to Ganeti, or to
let
+Ganeti control a helper instance, like the one hereby introduced for performing
+OS installs inside a safe environment).
+
+The communication mechanism will be enabled automatically when the instance is
+in ``self_install`` or ``install`` mode, but for backwards compatibility it
will
+be disabled when the instance is in ``run`` mode unless it is explicitly
+requested. Specifically, a new parameter ``--communication`` (short version:
+``-C``), with possible values ``true`` or ``false`` will be added to
+``gnt-instance add`` and ``gnt-instance modify``. It will determine whether the
+instance will have a communication channel set up to interact with the host and
+to receive metadata. The value of this parameter will be saved as part of the
+configuration of the instance.
+
+When the communication mechanism is enabled, Ganeti will create a new network
+interface inside the instance. This additional network interface will be the
+last one in the instance, after all the user defined ones. On the host side,
+this interface will only be accessible to the host itself, and not routed
+outside the machine.
+On this network interface, the instance will connect using the IP:
+169.254.169.1 and netmask 255.255.255.0.
+The host will be on the same network, with the IP address: 169.254.169.254.
+
+The way to create this interface depends on the specific hypervisor being used.
+In KVM, it is possible to create a network interface inside the instance
without
+having a corresponding interface created on the host. Using a command like::
+
+ kvm -net nic -net \
+ user,restrict=on,net=169.254.169.0/24,host=169.254.169.253,
+ guestfwd=tcp:169.254.169.254:80-tcp:127.0.0.1:8080
+
+a network interface will be created inside the VM, part of the 169.254.169.0/24
+network, where the VM will have IP address .253 and the host port 8080 will be
+reachable on port 80.
+
+In Xen, unfortunately, such a capability is not present, and an actual network
+interface has to be created on the host (using the ``vif`` parameter of the Xen
+configuration file). Each instance will have its corresponding ``vif`` network
+interface on the host. These interfaces will not be connected to each other in
+any way, and Ganeti will not configure them to allow traffic to be forwarded
+beyond the host machine. The ``vif-route`` script of Xen might be helpful in
+implementing this.
+It will be the system administrator's responsibility to ensure that the extra
+firewalling and routing rules specified on the host don't allow this
+accidentally.
+
+The instance will be able to connect to 169.254.169.254:80, and issue GET
+requests to an HTTP server that will provide the instance metadata.
+
+The choice of this IP address and port for accessing the metadata is done for
+compatibility reasons with OpenStack's and Amazon EC2's ways of providing
+metadata to the instance. The metadata will be provided by a single daemon,
+which will determine what instance the request comes from and reply with the
+metadata specific for that instance.
+
+Where possible, the metadata will be provided in a way compatible with Amazon
+EC2, at::
+
+ http://169.254.169.254/<version>/meta-data/*
+
+If some metadata are Ganeti-specific and don't fit this structure, they will be
+provided at::
+
+ http://169.254.169.254/ganeti/<version>/meta_data.json
+
+``<version>`` is either a date in YYYY-MM-DD format, or ``latest`` to indicate
+the most recent available protocol version.
+
+If needed in the future, this structure also allows us to support OpenStack's
+metadata at::
+
+ http://169.254.169.254/openstack/<version>/meta_data.json
+
+A bi-directional, pipe-like communication channel will be provided. The
instance
+will be able to receive data from the host by a GET request at::
+
+ http://169.254.169.254/ganeti/<version>/read
+
+and to send data to the host by a POST request at::
+
+ http://169.254.169.254/ganeti/<version>/write
+
+As in a pipe, once the data are read, they will not be in the buffer anymore,
so
+subsequent GET requests to ``read`` will not return the same data twice.
+Unlike a pipe, though, it will not be possible to perform blocking I/O
+operations.
+
+The OS parameters will be accessible through a GET
+request at::
+
+ http://169.254.169.254/ganeti/<version>/os/parameters.json
+
+as a JSON serialized dictionary having the parameter name as the key, and the
+pair ``(<value>, <visibility>)`` as the value, where ``<value>`` is the
+user-provided value of the parameter, and ``<visibility>`` is either
``public``,
+``private`` or ``secret``.
+
+The installation scripts to be run inside the virtualized environment while the
+instance is run in ``install`` mode will be available at::
+
+ http://169.254.169.254/<version>/ganeti/os/scripts/<script_name>
+
+where ``<script_name>`` is the name of the script.
+
+
+Rationale
+---------
+
+The choice of using a network interface for instance-host communication, as
+opposed to VirtIO, XenBus or other methods, is due to the will of having a
+generic, hypervisor-independent way of creating a communication channel, that
+doesn't require unusual (para)virtualization drivers.
+At the same time, a network interface was preferred over solutions involving
+virtual floppy or USB devices because the latter tend to be detected and
+configured by the guest operating systems, sometimes even in prominent
positions
+in the user interface, whereas it is fairly common to have an unconfigured
+network interface in a system, usually without any negative side effects.
+
+
+Installation process in a virtualized environment
++++++++++++++++++++++++++++++++++++++++++++++++++
+
+In the new OS installation scenario, we distinguish between trusted and
+untrusted code.
+
+The trusted installation code maintains the behavior of the current one and
+requires no modifications, with the scripts running on the node the instance is
+being created on. The untrusted code is stored in a subdirectory of the OS
+definition called ``untrusted``. This directory contains scripts that are
+equivalent to the already existing ones (``create``, ``export``, ``import``,
+``rename``) but that will be run inside an virtualized environment, to protect
+the host from malicious tampering.
+
+The ``untrusted`` code is meant to either be untrusted itself, or to be trusted
+code running operations that might be dangerous (such as mounting a
+user-provided image).
+
+By default, all new OS definitions will have to be explicitly marked as trusted
+by the cluster administrator (with a new ``gnt-os modify`` command) before they
+can run code on the host. Otherwise, only the untrusted part of the code will
be
+allowed to run, inside the virtual appliance. For backwards compatibility
+reasons, when upgrading an existing cluster, all the installed OSes will be
+marked as trusted, so that they can keep running with no changes.
+
+In order to allow for the highest flexibility, if both a trusted and an
+untrusted script are provided for the same operation (i.e. ``create``), both of
+them will be executed at the same time, one on the host, and one inside the
+installation appliance. They will be allowed to communicate with each other
+through the already described communication mechanism, in order to orchestrate
+their execution (e.g.: the untrusted code might execute the installation, while
+the trusted one receives status updates from it and delivers them to a user
+interface).
+
+The cluster administrator will have an option to completely disable scripts
+running on the host, leaving only the ones running in the VM.
+
+Ganeti will provide a script to be run at install time that can be used to
+create the virtualized environment that will perform the OS installation of new
+instances.
+This script will build a debootstrapped basic debian system including a
software
+that will read the metadata, setup the environment variables and launch the
+installation scripts inside the virtualized environment. The script will also
+provide hooks for personalization.
+
+It will also be possible to use other self-made virtualized environments, as
+long as they connect to Ganeti over the described communication mechanism and
+they know how to read and use the provided metadata to create a new instance.
+
+While performing an installation in the virtualized environment, a
+personalizable timeout will be used to detect possible problems with the
+installation process, and to kill the virtualized environment. The timeout will
+be optional and set on a cluster basis by the administrator. If set, it will be
+the total time allowed to setup an instance inside the appliance. It is mainly
+meant as a safety measure to prevent an instance taken over by malicious
scripts
+to be available for a long time.
+
+.. vim: set textwidth=72 :
+.. Local Variables:
+.. mode: rst
+.. fill-column: 72
+.. End: