This patch adds a design document detailing the implementation of a generic mechanism which will provide support for converting between different disk templates in Ganeti.
Signed-off-by: Dimitris Bliablias <[email protected]> Signed-off-by: Constantinos Venetsanopoulos <[email protected]> --- Makefile.am | 1 + doc/design-disk-conversion.rst | 281 ++++++++++++++++++++++++++++++++++++++++ doc/design-draft.rst | 1 + 3 files changed, 283 insertions(+) create mode 100644 doc/design-disk-conversion.rst --- 1.7.10.4 diff --git a/Makefile.am b/Makefile.am index 40f0a1a..996c8bd 100644 --- a/Makefile.am +++ b/Makefile.am @@ -579,6 +579,7 @@ docinput = \ doc/design-cpu-speed.rst \ doc/design-daemons.rst \ doc/design-device-uuid-name.rst \ + doc/design-disk-conversion.rst \ doc/design-draft.rst \ doc/design-file-based-storage.rst \ doc/design-glusterfs-ganeti-support.rst \ diff --git a/doc/design-disk-conversion.rst b/doc/design-disk-conversion.rst new file mode 100644 index 0000000..47f94a8 --- /dev/null +++ b/doc/design-disk-conversion.rst @@ -0,0 +1,281 @@ +================================= +Conversion between disk templates +================================= + +.. contents:: :depth: 4 + +This design document describes the support for generic disk template +conversion in Ganeti. The logic used is disk template agnostic and +targets to cover the majority of conversions among the supported disk +templates. + + +Current state and shortcomings +============================== + +Currently, Ganeti supports choosing among different disk templates when +creating an instance. However, converting the disk template of an +existing instance is possible only between the ``plain`` and ``drbd`` +templates. This feature was added in Ganeti since its early versions +when the number of supported disk templates was limited. Now that Ganeti +supports plenty of choices, this feature should be extended to provide +more flexibility to the user. + +The procedure for converting from the plain to the drbd disk template +works as follows. Firstly, a completely new disk template is generated +matching the size, mode, and the count of the current instance's disks. +The missing volumes are created manually both in the primary (meta disk) +and the secondary node. The original LVs running on the primary node are +renamed to match the new names. The last step is to manually associate +the DRBD devices with their mirror block device pairs. The conversion +from the drbd to the plain disk template is much simpler than the +opposite. Firstly, the DRBD mirroring is manually disabled. Then the +unnecessary volumes including the meta disk(s) of the primary node, and +the meta and data disk(s) from the previously secondary node are +removed. + + +Proposed changes +================ + +This design proposes the creation of a unified interface for handling +the disk template conversions in Ganeti. Currently, there is no such +interface and each one of the supported conversions uses a separate code +path. + +This proposal introduces a single, disk-agnostic interface for handling +the disk template conversions in Ganeti, keeping in mind that we want it +to be as generic as possible. An exception case will be the currently +supported conversions between the LVM-based disk templates. Their basic +functionality will not be affected and will diverge from the rest disk +template conversions. The target is to provide support for conversions +among the majority of the available disk templates, and also creating +a mechanism that will easily support any new templates that may be +probably added in Ganeti, at a future point. + + +Design decisions +================ + +Currently, the supported conversions for the LVM-based templates are +handled by the ``LUInstanceSetParams`` LU. Our implementation will +follow the same approach. From a high-level point-of-view this design +can be split in two parts: + +* The extension of the LU's checks to cover all the supported template + conversions + +* The new functionality which will be introduced to provide the new + feature + +The instance must be stopped before starting the disk template +conversion, as it currently is, otherwise the operation will fail. The +new mechanism will need to copy the disk's data for the conversion to be +possible. We propose using the Unix ``dd`` command to copy the +instance's data. It can be used to copy data from source to destination, +block-by-block, regardless of their filesystem types, making it a +convenient tool for the case. Since the conversion will be done via data +copy it will take a long time for bigger disks to copy their data and +consequently for the instance to switch to the new template. + +Some template conversions can be done faster without copying explicitly +their disks' data. A use case is the conversions between the LVM-based +templates, i.e., ``drbd`` and ``plain`` which will be done as happens +now and not using the ``dd`` command. Also, this implementation will +provide partial support for the ``blockdev`` disk template which will +act only as a source template. Since those volumes are adopted +pre-existent block devices we will not support conversions targeting +this template. Another exception case will be the ``diskless`` template. +Since it is a testing template that creates instances with no disks we +will not provide support for conversions that include this template +type. + + +We divide the design into the following parts: + +* Block device changes, that include the new methods which will be + introduced and will be responsible for building the commands for the + data copy from/to the requested devices + +* Backend changes, that include a new RPC call which will concatenate + the output of the above two methods and will execute the data copy + command + +* Core changes, that include the modifications in the Logical Unit + +* User interface changes, i.e., command line changes + + +Block device changes +-------------------- + +The block device abstract class will be extended with two new methods, +named ``Import`` and ``Export``. Those methods will be responsible for +building the commands that will be used for the data copy between the +corresponding devices. The ``Export`` method will build the command +which will export the data from the source device, while the ``Import`` +method will do the opposite. It will import the data to the newly +created target device. Those two methods will not perform the actual +data copy; they will simply return the requested commands for +transferring the data from/to the individual devices. The output of the +two methods will be combined using a pipe ("|") by the caller method in +the backend level. + +By default the data import and export will be done using the ``dd`` +command. All the inherited classes will use the base functionality +unless there is a faster way to convert to. In that case the underlying +block device will overwrite those methods with its specific +functionality. A use case will be the Ceph/RADOS block devices which +will make use of the ``rbd import`` and ``rbd export`` commands to copy +their data instead of using the default ``dd`` command. + +Keeping the data copy functionality in the block device layer, provides +us with a generic mechanism that works between almost all conversions +and furthermore can be easily extended for new disk templates. It also +covers the devices that support the ``access=userspace`` parameter and +solves this problem in a generic way, by implementing the logic in the +right level where we know what is the best to do for each device. + + +Backend changes +--------------- + +Introduce a new RPC call: + +* blockdev_convert(src_disk, dest_disk) + +where ``src_disk`` and ``dest_disk`` are the original and the new disk +objects respectively. First, the actual device instances will be +computed and then they will be used to build the export and import +commands for the data copy. The output of those methods will be +concatenated using a pipe, following a similar approach with the impexp +daemon. Finally, the unified data copy command will be executed, at this +level, by the ``nodeD``. + + +Core changes +------------ + +The main modifications will be made in the ``LUInstanceSetParams`` LU. +The implementation of the conversion mechanism will be split into the +following parts: + +* The generation of the new disk template for the instance. The new + disks will match the size, mode, and name of the original volumes. + Those parameters and any other needed, .i.e., the provider's name for + the ExtStorage conversions, will be computed by a new method which we + will introduce, named ``ComputeDisksInfo``. The output of that + function will be used as the ``disk_info`` argument of the + ``GenerateDiskTemplate`` method. + +* The creation of the new block devices. We will make use of the + ``CreateDisks`` method which creates and attaches the new block + devices. + +* The data copy for each disk of the instance from the original to the + newly created volume. The data copy will be made by the ``nodeD`` with + the rpc call we have introduced earlier in this design. In case some + disks fail to copy their data the operation will fail and the newly + created disks will be removed. The instance will remain intact. + +* The detachment of the original disks of the instance when the data + copy operation successfully completes by calling the + ``RemoveInstanceDisk`` method for each instance's disk. + +* The attachment of the new disks to the instance by calling the + ``AddInstanceDisk`` method for each disk we have created. + +* The update of the configuration file with the new values. + +* The removal of the original block devices from the node using the + ``BlockdevRemove`` method for each one of the old disks. + + +User interface changes +---------------------- + +The ``-t`` (``--disk-template``) option from the gnt-instance modify +command will specify the disk template to convert *to*, as it happens +now. The rest disk options such as its size, its mode, and its name will +be computed from the original volumes by the conversion mechanism, and +the user will not explicitly provide them. + + +ExtStorage conversions +~~~~~~~~~~~~~~~~~~~~~~ + +When converting to an ExtStorage disk template the +``provider=*PROVIDER*`` option which specifies the ExtStorage provider +will be mandatory. Also, arbitrary parameters can be passed to the +ExtStorage provider. Those parameters will be optional and could be +passed as additional comma separated options. Since it is not allowed to +convert the disk template of an instance and make use of the ``--disk`` +option at the same time, we propose to introduce a new option named +``--ext-params`` to handle the ``ext`` template conversions. + +:: + + gnt-instance modify -t ext --ext-params provider=pvdr1 test_vm + gnt-instance modify -t ext --ext-params provider=pvdr1,param1=val1,param2=val2 test_vm + + +File-based conversions +~~~~~~~~~~~~~~~~~~~~~~ + +For conversions *to* a file-based template the ``--file-storage-dir`` +and the ``--file-driver`` options could be used, similarly to the +**add** command, to manually configure the storage directory and the +preferred driver for the file-based disks. + +:: + + gnt-instance modify -t file --file-storage-dir=mysubdir test_vm + + +Supported template conversions +============================== + +This is a summary of the disk template conversions that the conversion +mechanism will support: + ++--------------+-----------------------------------------------------------------------------------+ +| Source | Target Disk Template | +| Disk +---------+-------+------+------------+---------+------+------+----------+----------+ +| Template | Plain | DRBD | File | Sharedfile | Gluster | RBD | Ext | BlockDev | Diskless | ++==============+=========+=======+======+============+=========+======+======+==========+==========+ +| Plain | - | Yes. | Yes. | Yes. | Yes. | Yes. | Yes. | No. | No. | ++--------------+---------+-------+------+------------+---------+------+------+----------+----------+ +| DRBD | Yes. | - | Yes. | Yes. | Yes. | Yes. | Yes. | No. | No. | ++--------------+---------+-------+------+------------+---------+------+------+----------+----------+ +| File | Yes. | Yes. | - | Yes. | Yes. | Yes. | Yes. | No. | No. | ++--------------+---------+-------+------+------------+---------+------+------+----------+----------+ +| Sharedfile | Yes. | Yes. | Yes. | - | Yes. | Yes. | Yes. | No. | No. | ++--------------+---------+-------+------+------------+---------+------+------+----------+----------+ +| Gluster | Yes. | Yes. | Yes. | Yes. | - | Yes. | Yes. | No. | No. | ++--------------+---------+-------+------+------------+---------+------+------+----------+----------+ +| RBD | Yes. | Yes. | Yes. | Yes. | Yes. | - | Yes. | No. | No. | ++--------------+---------+-------+------+------------+---------+------+------+----------+----------+ +| Ext | Yes. | Yes. | Yes. | Yes. | Yes. | Yes. | - | No. | No. | ++--------------+---------+-------+------+------------+---------+------+------+----------+----------+ +| BlockDev | Yes. | Yes. | Yes. | Yes. | Yes. | Yes. | Yes. | - | No. | ++--------------+---------+-------+------+------------+---------+------+------+----------+----------+ +| Diskless | No. | No. | No. | No. | No. | No. | No. | No. | - | ++--------------+---------+-------+------+------------+---------+------+------+----------+----------+ + + +Future Work +=========== + +Expand the conversion mechanism to provide a visual indication of the +data copy operation. We could monitor the progress of the data sent via +a pipe, and provide to the user information such as the time elapsed, +percentage completed (probably with a progress bar), total data +transferred, and so on, similar to the progress tracking that is +currently done by the impexp daemon. + + +.. vim: set textwidth=72 : +.. Local Variables: +.. mode: rst +.. fill-column: 72 +.. End: diff --git a/doc/design-draft.rst b/doc/design-draft.rst index 55bed7c..95ecdc6 100644 --- a/doc/design-draft.rst +++ b/doc/design-draft.rst @@ -23,6 +23,7 @@ Design document drafts design-node-security.rst design-systemd.rst design-cpu-speed.rst + design-disk-conversion.rst .. vim: set textwidth=72 : .. Local Variables: -- 1.7.10.4
