On Mon, Nov 18, 2013 at 1:50 PM, Apollon Oikonomopoulos <[email protected]> wrote: > Hi Michele, list, > > I'll be commenting from scratch, since I have lost track of the > different in-depth threads of discussion. If any of my questions have > been answered, don't bother answering again, just NAK them :).
No problem. I'll resend an updated document including all the suggested modifications soon. > > On 11:41 Tue 12 Nov , Michele Tartara wrote: >> Add the document describing a new design for the OS installation process for >> new instances. >> >> Signed-off-by: Michele Tartara <[email protected]> >> --- >> doc/design-draft.rst | 1 + >> doc/design-os.rst | 318 >> ++++++++++++++++++++++++++++++++++++++++++++++++++ >> 2 files changed, 319 insertions(+) >> create mode 100644 doc/design-os.rst >> >> diff --git a/doc/design-draft.rst b/doc/design-draft.rst >> index c821292..3ed3852 100644 >> --- a/doc/design-draft.rst >> +++ b/doc/design-draft.rst >> @@ -20,6 +20,7 @@ Design document drafts >> design-daemons.rst >> design-hsqueeze.rst >> design-ssh-ports.rst >> + design-os.rst >> >> .. vim: set textwidth=72 : >> .. Local Variables: >> diff --git a/doc/design-os.rst b/doc/design-os.rst >> new file mode 100644 >> index 0000000..7a42a7f >> --- /dev/null >> +++ b/doc/design-os.rst >> @@ -0,0 +1,318 @@ >> +=============================== >> +Ganeti OS installation redesign >> +=============================== >> + >> +.. contents:: :depth: 3 >> + >> +This is a design document detailing a new OS installation procedure, more >> +secure, able to provide more features and easier to use for many common >> tasks >> +w.r.t. the current one. >> + >> +Current state and shortcomings >> +============================== >> + >> +As of Ganeti 2.10, each instance is associated with an OS definition. An OS >> +definition is a set of scripts (``create``, ``export``, ``import``, >> ``rename``) >> +that are executed with root privileges on the primary host of the instance >> to >> +perform all the OS-related functionality (setting up an operating system >> inside >> +the disks of the instance being created, exporting/importing the instance, >> +renaming it). >> + >> +These scripts receive, as environment variables, a fixed set of parameters >> +describing the instance (such as the hypervisor, the name of the instance, >> the >> +number of disks, and their location) and a set of user defined parameters. >> Each >> +of these parameters is also written into the configuration file of Ganeti, >> to >> +allow for future reinstalls of the instance, and in various log files, >> namely: > > The previous paragraph does not directly map to Ganeti concepts. I think > something like "..., a fixed set of Ganeti-specific parameters (instance > definition) and a set of OS-provider-specific parameters (OS > parameters), both of which are written in the main configuration file" > would help map the concepts there with Ganeti primitives. Right, that's more clear. > >> + >> +* node daemon log file: contains DEBUG strings of the ``/os_validate``, >> + ``/instance_os_add`` and ``/instance_start`` RPC calls. >> + >> +* master daemon log file: DEBUG strings related to the same RPC calls are >> stored >> + here as well. >> + >> +* commands log: the CLI commands that create a new instance, including their >> + parameters, are logged here. >> + >> +* RAPI log: the RAPI commands that create a new instances, including their >> + parameters, are logged here. >> + >> +* job logs: the job files stored in the job queue or in its archive contain >> the >> + parameters. >> + >> +The current situation presents a number of shortcomings: >> + >> +* Having the installation scripts run with root power on the nodes is a huge >> + security issue. >> + >> +* Ganeti cannot be used to create instances starting from user provided disk >> + images: even in the (hypothetical) case where the scripts are completely >> + secure and run not by root but by an unprivileged user with only the >> power to >> + mount arbitrary files as disk images, this is a security issue. It has >> been >> + proven that a carefully crafted file system might exploit kernel >> + vulnerabilities to gain control of the system. Therefore, directly >> mounting >> + images on the Ganeti nodes is not an option. >> + >> +* There is no way to inject files into an existing disk image. A common use >> case >> + is for the system administrator to provide a standard image of the >> system, to >> + be later personalized with the network configuration, private keys >> identifying >> + the machine, ssh keys of the users and so on. A possible workaround would >> be >> + for the scripts to mount the image (only if this is trusted!) and to >> receive >> + the configurations and ssh keys as user defined OS parameters. >> Unfortunately, >> + this is also not an option for security sensitive material (such as the >> ssh >> + keys) because the OS parameters are stored in many places on the system, >> as >> + already described above. >> + >> +* Most other virtualization software simply work with instance images, not >> with >> + installation scripts. This difference makes the interaction of Ganeti with >> + other softwares difficult. >> + >> +Proposed changes >> +================ >> + >> +In order to fix the shortcomings of the current state, we plan to introduce >> the >> +following changes: >> + >> +* Change the OS parameters to have three categories: > > "to have three categories, based on their visibility:" Ack. > >> + >> + * ``public``: the current behavior. The parameter is logged and stored >> freely. >> + >> + * ``private``: the parameter is saved inside the Ganeti configuration (to >> allow >> + for instance reinstall) but it is not shown in logs, job logs, or passed >> back >> + via RAPI. >> + >> + * ``secret``: the parameter is not saved inside the Ganeti configuration. >> + Reinstall are impossible unless the data is passed again. The parameter >> will >> + not appear in any log file. In order to preserve the functionality of >> Ganeti, >> + the parameters will still need to be stored in the job files, but they >> will >> + be removed from there when the job has finished running (either >> successfully >> + or not). >> + >> +* A new OS installation procedure, based on a safe virtualized environment. >> + This virtualized environment will run with the same hardware parameter as >> the >> + actual instance being installed, as much as possible. This will also >> allow to >> + reduce the memory usage in the host (specifically, in Dom0 for Xen >> + installations). Each instance will have these possible execution modes: >> + >> + * ``run``: the default mode, used when the machine is running normally. >> + >> + * ``self_install``: Ganeti will start the instance with a different set of >> + user-specified parameters, therefore allowing to attach an installation >> + floppy/cdrom/network, change the boot device order, or specify an OS >> image >> + to be used. The instance will then be responsible to get the parameters >> for >> + configuring itself (its network interfaces, IP address, hostname, etc.) >> from >> + a set of metadata provided to it by Ganeti (e.g.: using an approach >> + comparable to the one of the ``cloud-init`` tool). When this >> installation >> + mode is used, no OS installation script is required. >> + In order for installation of an OS from an image to be possible, a new >> + parameter ``--os-image`` will be added, allwoing to specify where to >> take >> + the image from. It will have to be mutually exclusive with >> ``--os-type``. If >> + ``--os-image`` is specified, ``--os-parameters`` can still be used, as >> it >> + will be passed to the instance as part of the metadata. >> + The set of ``self_install`` parameters will be stored as part of the >> + instance configuration, so that they can be used to reinstall the >> instance. >> + It will be the user's responsibility to ensure that the OS image or any >> + installation media is still available in the proper position when a >> + reinstall happens. >> + >> + * ``install``: Ganeti will start the instance using a virtual appliance >> + specifically made for installing Ganeti instances. Scripts analogous to >> the >> + current ones will run inside this instance. The disks of the instance >> being >> + installed will be connected to this virtual appliance, so that the >> scripts >> + can mount them and modify them as needed, as currently happens, but >> with the >> + additional protection given by this happening in a VM. The virtual >> appliance >> + will be started in a clean state every time a new instance need to be >> + created, to further increase security. Metadata will be provided also to >> + this virtual applicance, that will take care of converting them to >> + environment variables for the installation scripts. >> + > > What is the difference between "install" and "self_install". Couldn't > "install" be implemented as a Ganeti-supplied image for "self_install"? > Is "self_install"'s aim to provide a drop-in replacement working with > the existing OS providers? I already partially clarified in a couple of previous email, and in the updated document that I'll send soon, but let me try and reply specifically to your question. Existing OS providers don't need any appliance or instance to be run. So if an instance has to be created by a traditional OS script, it will just be created and then it will be started in ``run`` mode, being already ready to use, exactly as it happens today. ``self_install`` is just a mode where the first boot of the instance is performed with a different set of parameters passed to the instance. These parameters will have to specify some "installation media", like an installation cdrom, or a particular network where it is possible to perform a net boot. This installation media, will have to be made in such a way that at boot it will read the metadata provided to it and to install and configure the VM itself. It's called self-install because there is no "external" appliance. Regarding ``install`` being a Ganeti-supplied image for self_install, it's definitely an interesting technical possibility, but I'm not sure I want to make a binding decision in this direction as part of the design document. I think it's more of an implementation detail, and the design should maintain a clear distinction between the case where there's an helper instance (install) and the case where there is none (self_install). > > Should the following not be list items (as is done previously)? Ack. > >> +In order to allow for the metadata to be sent inside the instance, a >> +communication mechanism between the instance and the host will be created. >> This >> +mechanism will be bidirectional (e.g.: to allow the setup process going on >> +inside the instance to communicate its progress to the host). Each instance >> will >> +have access exclusively to its own metadata, and it will be only able to >> +communicate with its host over this channel. >> + >> +As part of the instance creation command it will be possible to indicate a >> URL >> +for a "personalization package", that is an archive containing a set of >> files >> +meant to be overlayed on top of the operating system file system at the end >> of >> +the setup process, before the VM is started for the first time in ``run`` >> mode. >> +Ganeti will provide a mechanism for receiving and unpacking this archive as >> part >> +of the ``install`` execution mode, whereas in ``self_install`` mode it will >> only >> +be provided as a metadata for the instance to use. > >> +The archive will be in TAR-GZIP format (with extension ``.tar.gz`` or >> ``.tgz``) >> +and will contain the files according to the directory structure that will be >> +recreated on the installation disk. Files contained in this archive will >> +overwrite files with the same path created during the install procedure (if >> +any). >> +The URL of the "personalization package" will have to specify an extesion to >> +identify the file format (in order to allow for more formats to be >> supported in >> +the future). >> +The URL will be stored as part of the configuration of the instance >> (therefore, >> +the URL should not contain confidential information, but the file there >> +available can). It is up to the system administrator to ensure that a >> package >> +is actually available at that URL at install and reinstall time. >> +The content of the package is allowed to change. E.g.: a system >> administrator >> +might create a package containing the private keys of the instance being >> +created. When the instance is reinstalled, a new package with new keys can >> be >> +made available there, therefore allowing instance reinstall without the >> need to >> +store keys. >> + >> +Implementation >> +============== >> + >> +The implementation of this design will happen as an ordered sequence of >> steps, >> +of increasing impact on the system and, in some cases, dependent on each >> other: >> + >> +#. Private and secret instance parameters >> +#. Communication mechanism between host and instance >> +#. Metadata service >> +#. Personalization package >> +#. ``self_install`` mode >> +#. ``install`` mode (with virtualization environment) >> + >> +Some of these steps need to be more deeply specified w.r.t. what is already >> +written in the `Proposed changes`_ Section. Extra details will be provided >> in >> +the following Subsections. >> + >> +Communication mechanism and metadata service >> +++++++++++++++++++++++++++++++++++++++++++++ >> + >> +The communication mechanism and the metadata service are described together >> +because they are deeply tied. On the other hand, the communication mechanism >> +will need to be more generic because it can be used for other reasons in the >> +future (like allowing instances to esplicitly send commands to Ganeti, or >> to let >> +Ganeti control a helper instance, like the one hereby introduced for >> performing >> +OS installs inside a safe environment). >> + >> +The communication mechanism will be enabled automatically when the instance >> is >> +in ``self_install`` or ``install`` mode, but for backwards compatibility it >> will >> +be disabled when the instance is in ``run`` mode unless it is esplicitly >> +requested at instance startup by using a new, ad-hoc, parameter >> +(``--communication``). >> + >> +When the communication mechanism is enabled, Ganeti will create a new >> network >> +interface inside the instance. This extra network interface will be the >> last one >> +of the instance, after all the user defined ones. On the host side, this >> +interface will be only accessible to the host itself, and not be routed >> outside >> +the machine. > > We should assume that this extra NIC will not be subject to the MAX_NICS > limitation, right? Yes, if it will actually end up being a NIC, it's definitely going to be a special one, managed separately. > >> +On this network interface, the instance will connect using the IP: >> +169.254.169.1 and netmask 255.255.255.0. >> +The host will be on the same network, with the IP address: 169.254.169.254. >> +The instance will be able to connect to 169.254.169.254:80, and issue GET >> +requests to an HTTP server that will provide the instance metadata. >> + >> +The choice of this IP address and port is done for compatibility reasons >> with >> +OpenStack's and Amazon EC2's ways of providing metadata to the instance. >> + >> +Where possible, the metadata will be provided in a way compatible with >> OpenStack >> +at:: >> + >> + http://169.254.169.254/openstack/<version>/meta_data.json >> + >> +or with Amazon EC2, at:: >> + >> + http://169.254.169.254/<version>/meta-data/* >> + >> +If some metadata are Ganeti-specific and don't fit this structure, they >> will be >> +provided at:: >> + >> + http://169.254.169.254/<version>/ganeti/meta_data.json >> + >> +``<version>`` is either a date in YYYY-MM-DD format, or ``latest`` to >> indicate >> +the most recent available protocol version. >> + >> +A bi-directional, pipe-like communication channel will be provided. The >> instance >> +will be able to receive data from the host by a GET request at:: >> + >> + http://169.254.169.254/<version>/ganeti/pipe_in >> + >> +and to send data to the host by a POST request at:: >> + >> + http://169.254.169.254/<version>/ganeti/pipe_out >> + >> +As in a pipe, once the data are read, they will not be in the buffer >> anymore, so >> +subsequent get request to ``pipe_in`` will not return the same data twice. >> +Unlike a pipe, though, it will not be possible to perform blocking I/O >> +operations. >> + >> +The OS parameters will be accessible through a GET >> +request at:: >> + >> + http://169.254.169.254/<version>/ganeti/os/parameters/<visibility>.json >> + >> +as a JSON serialized dictionary. ``<visibility>`` will be either ``public`` >> or >> +``private`` or ``secret``. >> + >> +The installation scripts to be run inside the virtualized environment while >> the >> +instance is run in ``install`` mode will be available at:: >> + >> + http://169.254.169.254/<version>/ganeti/os/scripts/<script_name> >> + >> +where ``<script_name>`` is the name of the script. >> + >> +The host and the instances (as detailed in `Installation process in a >> +virtualized environment`_) will be able to create other communication >> channels >> +on the other ports of the same IP address. >> + >> + >> +Rationale >> +--------- >> + >> +The choice of using a network interface for instance-host communication, as >> +opposed to VirtIO, XenBus or other methods, is due to the will of having a >> +generic, hypervisor-independent way of creating a communication channel, >> that >> +doesn't require unusual (para)virtualization drivers. >> +At the same time, a network interface was preferred over solutions involving >> +virtual floppy or USB devices because the latter tend to be detected and >> +configured by the guest operating systems, sometimes even in prominent >> positions >> +in the user interface, whereas it is fairly common to have an unconfigured >> +network interface in a system, usually without any negative side effects. > > To recap the previous discussions about this: indeed, a TCP/IP stack is > what every installer out there is almost certain to have nowadays. > However, care must be taken that: > > a) the instance is properly "jailed" wrt. network resources > b) this is done in a way not messing with the local sysadmin's policies > e.g. without overriding (parts of) the node firewall. > > Under KVM this can be done using > > '-net user,restrict=on,guestfwd=tcp:169.254.169.254:80-tcp:127.0.0.1:8080' > > however I don't know if something like this can be done in Xen (let > alone LXC and chroot which would have to use netns with clean routing > tables and firewall rules). These are all implementation details of > course, but I think we should keep them in mind. This behaves exactly in the best way possible for what we intend to do, so having it for KVM is already a great step forward. Thanks a lot! If I can find an analogous command line for Xen too (and I'll look for it in the next days), I guess we are good to go, given that, as far as I know, LXC support is experimental and incomplete, so I doubt it's going to be a road blocker. > > Plain character devices OTOH (such as virtio-serial or Xen PV serial) > have the advantage of being de facto point-to-point links, whose data > will not be interpreted by the kernel, nor forwarded due to policies > outside Ganeti's control (e.g. IP forwarding). True. But they might be more problematic from the point of view of supporting in some operating systems. Furthermore, many tools already use network connections and Amazon EC2's format for the metadata. Keeping compatibility with them would be a huge plus. > >> +Installation process in a virtualized environment >> ++++++++++++++++++++++++++++++++++++++++++++++++++ >> + >> +In the new OS installation scenario, we distinguish between trusted and >> +untrusted code. >> + >> +The trusted installation code maintains the behavior of the current one, >> with >> +the scripts running on the node the instance is being created on. The >> untrusted >> +code is stored in a subdirectory of the OS definition called ``untrusted``. >> +This directory contains scripts that are equivalent to the already existing >> +ones (``create``, ``export``, ``import``, ``rename``) but that will be run >> +inside an virtualized environment, to protect the host from malicious >> tampering. >> + >> +The ``untrusted`` code is meant to either be untrusted itself, or to be >> trusted >> +code running operations that might be dangerous (such as mounting a >> +user-provided image). >> + >> +In order to allow for the highest flexibility, if both a trusted and an >> +untrusted script are provided for the same operation (i.e. ``create``), >> both of >> +them will be executed at the same time, one on the host, and one inside the >> +installation appliance. They will be allowed to communicate with each other >> +through the already described communication mechanism, in order to >> orchestrate >> +their execution (e.g.: the untrusted code might execute the installation, >> while >> +the trusted one receives status updates from it and delivers them to a user >> +interface). >> + >> +Ganeti will provide a script to be run at install time that can be used to >> +create the virtualized environment that will perform the OS installation of >> new >> +instances. >> +This script will build a debootstrapped basic debian system including >> including >> +a software that will read the metadata, setup the environment variables and >> +launch the installation scripts inside the virtualized environment. The >> script >> +will also provide hooks for personalization. >> + >> +It will also be possible to use other self-made virtualized environment, as >> long >> +as they connect to ganeti over the described communication mechanism and >> they >> +know how to read and use the provided metadata to create a new instance. >> + >> +While performing an installation in the virtualized environment, a >> +personalizable timeout will be used to detect possible problems with the >> +installation process, and to kill the virtualized environment. > > The "Proposed changes" section allows to assume that the whole > installation will take place inside a virtualized environment, and that > all needed information will be supplied over the communications channel. > Can you clarify what is the need for/purpose of the trusted code? Indeed, it will be possible (and probably suggested) to run all the code as untrusted inside the VM. To know why keeping also the trusted code, have a look at Vangelis' emails in this thread. Especially the second one. It's mainly to meet a feature request of theirs. And for backwards compatibility (even if that could be achieved also by running the script inside the virtual appliance, if that is dealt with in the proper way). > > Also missing from the document is the compatibility with the existing OS > providers. How will this work? I extended this point in the in the upcoming version of the document. Thanks a lot for all your feedback. Michele -- Google Germany GmbH Dienerstr. 12 80331 München Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg Geschäftsführer: Graham Law, Christine Elizabeth Flores
