> +This behaviour is clearly wrong, but the problem doesn't arise often in
> current
> +setup, due to the fact that instances currently only have a single
> +storage type.
I guess, the main reason why it works in practise is that admins tend to group
instances
on node groups by needs. Otherwise, a single storage type per instance wouldn't
suffice
if we freely mix and match, e.g., file storage and DRBD.
> +Proposed changes
> +================
> +
> +Definitions
> +-----------
> +
> +* All disks have exactly one *desired storage unit*, which determines where
> and
> + how the disk can be stored. If the disk is transfered, the desired storage
> + unit remains unchanged. The desired storage unit includes specifics like
> the
> + volume group in the case of LVM based storage.
> +* A *storage unit* is a specific storage location on a specific node. Storage
> + units have exactly one desired storage unit they can contain. A storage
> unit
> + further has a name, a total capacity, and a free capacity.
> +* For the purposes of this document a *disk* has a desired storage unit and
> a size.
> +* A *disk can be moved* to a node, if there is at least one storage unit on
> + that node which can contain the desired storage unit of the disk and if the
> + free capacity is at least the size of the disk.
> +* An *instance can be moved* to a node, if all its disks can be moved there
> + one-by-one.
> +
> +LUXI extension
> +--------------
> +
> +The LUXI protocol is extended to include:
please specify for which entity this information is included.
> +
> +* ``storage``: a list of objects (storage units) with
> + # Storage unit, containing in order:
> + # storage type
> + # storage key (e.g. volume group name)
> + # extra parameters (e.g. flag for exclusive storage) as a list.
> + # Amount free in MiB
> + # Amount total in MiB
> +
> +.. code-block:: javascript
> +
> + {
> + "storage": [
> + { "stype": ["drbd8", "xenvg", [false]]
> + , "free": 2000,
> + , "total": 4000
> + },
> + { "stype": ["file", "/path/to/storage1", []]
> + , "free": 5000,
> + , "total": 10000
> + },
> + { "stype": ["file", "/path/to/storage2", []]
> + , "free": 1000,
> + , "total": 20000
> + },
> + { "stype": ["lvm-vg", "xenssdvg", [false]]
> + , "free": 1024,
> + , "total": 1024
> + }
> + ]
> + }
> +
> +is an instance with an LVM volume group mirrored over DRBD, two file storage
"instance" or "node". Note that, since "instance" has a special meaning in
Ganeti,
if you use that word in its normal English meaning you have to specify of what
it is an instance of.
> +directories, one half full, one mostly full, and a non-mirrored volume group.
> +
> +The storage type ``drbd8`` needs to be added in order to differentiate
> between
> +mirrored storage and non-mirrored storage.
> +
> +IAllocator protocol extension
> +-----------------------------
> +
> +The same field is optionally present in the IAllocator protocol:
> +
> +* a new "storage" column is added, which is a semicolon separated list of
> + comma separated fields in the order
> + #. ``stype``
> + #. ``free``
> + #. ``total``
> +
> +For example:
> +
> + drbd8,2000,4000;file,5000,10000;file,1000,20000;lvm-vg,1024,1024
That looks more like the text format. The IAllocator protocol is a JSON
protocol.
> +
> +Interpretation
> +--------------
> +
> +hbal and hail will use this information only if available, if the data file
> +doesn't contain the ``storage`` field the old algorithm is used.
> +
> +If the node information contains the ``storage`` field, hbal and hail will
> +assume that only the space compatible with the disk's requirements is
> +available. For an instance to fit a node, all it's disks need to fit there
> +separately. For a disk to fit a node, a storage unit of the type of
> +the disk needs to have enough free space to contain it.
Please also specify what is supposed to happen if the new ``storage`` field
promisses
more space than the current total sum fields.
> +
> +Balancing
> +---------
> +
> +In order to determine a storage location for an instance, we collect
> analogous
> +metrics to the current total node free space metric -- namely the standard
> deviation
> +statistic of the free space per storage unit.
> +
> +The full storage metric for a given desired storage unit is a weighted sum of
> +the standard deviation metric of the storage units. The weights of the
> storage
> +units are proportional to the total of that storage unit and sum up to the
> +weight of space in the old implementation (1.0).
This has the effect, that the most scarce resource is valued the least. Is this
on purpose? I'm thinking of a situation with lots of storage on conventional
spinning disks and a limited amount of solid state disks.
> +This is necessary to
> +
> +#. Keep the metric compatible.
> +#. Avoid that the metric of a node with many storage units is dominated by
> them.
> +
> +Note that the metric is independent of the storage type to be placed, but the
> +other types don't change the ranking of the possible placements.
I fail to understand that sentence. Can you please elaborate on it?
--
Klaus Aehlig
Google Germany GmbH, Dienerstr. 12, 80331 Muenchen
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschaeftsfuehrer: Graham Law, Christine Elizabeth Flores