On Tue, May 13, 2014 at 9:50 AM, 'Jose A. Lopes' via ganeti-devel <
[email protected]> wrote:

> On Apr 23 16:22, Dimitris Aragiorgis wrote:
> > This design doc describes how to extend the existing network
> > management and make it more flexible and able to deal with more
> > generic use cases. It proposes support for:
> >
> >  - Networks with multiple subnets
> >  - Subnets with multiple IP pools
> >  - NICs with multiple IPs from various subnets of a single network
> >
> > Signed-off-by: Dimitris Aragiorgis <[email protected]>
> > ---
> >
> > Hello team,
> >
> > After our discussions during GanetiCon 2013 and a recent discussion with
> > Jose, I'm sending the revised design document for networks, incorporating
> > all your comments.
> >
> > Looking forward to your feedback,
> > dimara
> >
> >  Makefile.am             |    1 +
> >  doc/design-draft.rst    |    1 +
> >  doc/design-network2.rst |  400
> +++++++++++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 402 insertions(+)
> >  create mode 100644 doc/design-network2.rst
> >
> > diff --git a/Makefile.am b/Makefile.am
> > index f2589e6..140608f 100644
> > --- a/Makefile.am
> > +++ b/Makefile.am
> > @@ -586,6 +586,7 @@ docinput = \
> >       doc/design-multi-reloc.rst \
> >       doc/design-multi-version-tests.rst \
> >       doc/design-network.rst \
> > +     doc/design-network2.rst \
> >       doc/design-node-add.rst \
> >       doc/design-oob.rst \
> >       doc/design-openvswitch.rst \
> > diff --git a/doc/design-draft.rst b/doc/design-draft.rst
> > index 55bed7c..926f35b 100644
> > --- a/doc/design-draft.rst
> > +++ b/doc/design-draft.rst
> > @@ -23,6 +23,7 @@ Design document drafts
> >     design-node-security.rst
> >     design-systemd.rst
> >     design-cpu-speed.rst
> > +   design-network2.rst
> >
> >  .. vim: set textwidth=72 :
> >  .. Local Variables:
> > diff --git a/doc/design-network2.rst b/doc/design-network2.rst
> > new file mode 100644
> > index 0000000..84a44e8
> > --- /dev/null
> > +++ b/doc/design-network2.rst
> > @@ -0,0 +1,400 @@
> > +============================
> > +Network Management (revised)
> > +============================
> > +
> > +.. contents:: :depth: 4
> > +
> > +This is a design document detailing how to extend the existing network
> > +management and make it more flexible and able to deal with more generic
> > +use cases.
> > +
> > +
> > +Current state and shortcomings
> > +------------------------------
> > +
> > +Currently in Ganeti, networks are tightly connected with IP pools,
> > +since creation of a network implies the existence of one subnet
> > +and the corresponding IP pool. This design does not allow common
> > +scenarios like:
> > +
> > +- L2 only networks
> > +- IPv6 only networks
> > +- Networks without an IP pool
> > +- Networks with an IPv6 pool
> > +- Networks with multiple IP pools (alternative to externally reserving
> > +  IPs)
> > +
> > +Additionally one cannot have multiple IP pools inside one network.
> > +Finally, from the instance perspective, a NIC cannot get more than one
> > +IPs (v4 and v6).
> > +
> > +
> > +Proposed changes
> > +----------------
> > +
> > +In order to deal with the above shortcomings, we propose to extend
> > +the existing networks in Ganeti and support:
> > +
> > +a) Networks with multiple subnets
> > +b) Subnets with multiple IP pools
> > +c) NICs with multiple IPs from various subnets of a single network
> > +
> > +These changes bring up some design and implementation issues that we
> > +discuss in the following sections.
> > +
> > +Semantics
> > +++++++++++
> > +
> > +Quoting the initial network management design doc "an IP pool consists
> > +of two bitarrays. Specifically the ``reservations`` bitarray which holds
> > +all IP addresses reserved by Ganeti instances and the ``external
> > +reservations`` bitarray with all IPs that are excluded from the IP pool
> > +and cannot be assigned automatically by Ganeti to instances (via
> > +ip=pool)".
> > +
> > +Without violating those semantics, here, we clarify the following
> > +definitions.
> > +
> > +**network**: A cluster level taggable configuration object with a
> > +user-provider name, (e.g. network1, network2), UUID and MAC prefix.
> > +
> > +**L2**: The `mode` and `link` with which we connect a network to a
> > +nodegroup. A NIC attached to a network will inherit this info, just like
> > +connecting an Ethernet cable to a physical NIC. In this sense we only
> > +have one L2 info per NIC.
> > +
> > +**L3**: A CIDR and a gateway related to the network. Since a NIC can
> > +have multiple IPs on the same cable each network can have multiple L3
> > +info with the restriction that they do not overlap with each other.
>
> Hi,
>
> Great design document.  Great job!
>
> I would like to ask a few things.
>
> Is the gateway optional?
>
> > +
> > +**subnet**: A subnet is the above L3 info plus some additional
> information
> > +(see below).
> > +
> > +**ip**: A valid IP should reside in a network's subnet, and should not
> > +be used by more than one instance. An IP can be either obtained
> dynamically
> > +from a pool or requested explicitly from a subnet (or a pool).
> > +
> > +**range**: Sequential IPs inside one subnet calculated either from the
> > +first IP and a size (e.g. start=192.0.2.10, size=10) or the first IP and
> > +the last IP (e.g. start=192.0.2.10, end=192.0.2.19). A single IP can
> > +also be thought of as an IP range with size=1 (see configuration
> > +changes).
> > +
> > +**reservations**: All IPs that are used by instances in the cluster at
> > +any time.
> > +
> > +**external reservations**: All IPs that are supposed to be reserved
> > +by the admin for either some external component or specific instances.
> > +If one instance requests an external IP explicitly (ip=192.0.2.100),
> > +Ganeti will allow the operation only if ``--force`` is given. Still, the
> > +admin can externally reserve an IP that is already in use by an
> > +instance, as happens now. This helps to reserve an IP for future use and
> > +at the same time prevent any possible race between the instance that
> > +releases this IP and another that tries to retrieve it.
> > +
> > +**pool**: A (range, reservations, name) tuple from which instances can
> > +dynamically obtain an IP. Reservations is a bitarray with
> > +length the size of the range, and is needed so that we know which IPs
> > +are available at any time without querying all instances. The use of
> > +name is explained below. A subnet can have multiple pools.
> > +
> > +
> > +Split L2 from L3
> > +++++++++++++++++
> > +
> > +Currently networks in Ganeti do not separate L2 from L3. This means
> > +that one cannot use L2 only networks. The reason is because the CIDR
> > +(passed currently with the ``--network`` option) and the derived IP pool
> > +are mandatory. This design makes L3 info optional. This way we can have
> > +an L2 only network just by connecting a Ganeti network to a nodegroup
> > +with the desired `mode` and `link`. Then one could add one or more
> subnets
> > +to the existing network.
> > +
> > +
> > +Multiple Subnets per Network
> > +++++++++++++++++++++++++++++
> > +
> > +Currently the IPv4 CIDR is mandatory for a network. Also a network can
> > +obtain at most one IPv4 CIDR and one IPv6 CIDR. These restrictions will
> > +be lifted.
> > +
> > +This design doc introduces support for multiple subnets per network. The
> > +L3 info will be moved inside the subnet. A subnet will have a `name` and
> > +a `uuid` just like NIC and Disk config objects. Additionally it will
> contain
> > +the `dhcp` flag which is explained below, and the `pools` and `external`
> > +fields which are mentioned in the next section. Only the `cidr` will be
> > +mandatory.
> > +
> > +Any subnet related actions will be done via the new ``--subnet`` option.
> > +Its syntax will be similar to ``--net``.
> > +
> > +The network's subnets must not overlap with each other. Logic will
> > +validate any operations related to reserving/releasing of IPs and check
> > +whether a requested IP is included inside one of the network's subnets.
> > +Just like currently, the L3 info will be exported to NIC configuration
> > +hooks and scripts as environment variables. The example below adds
> > +subnets to a network:
> > +
> > +::
> > +
> > +  gnt-network modify --subnet add:cidr=
> 10.0.0.0/24,gateway=10.0.0.1,dhcp=true net1
> > +  gnt-network modify --subnet
> add:cidr=2001::/64,gateway=2001::1,dhcp=true net1
> > +
> > +To remove a subnet from a network one should use:
> > +
> > +::
> > +
> > +  gnt-network modify --subnet some-ident:remove network1
> > +
> > +where some-ident can be either a CIDR, a name or a UUID. Ganeti will
> > +allow this operation only if no instances use IPs from this subnet.
> > +
> > +Since DHCP is allowed only for a single CIDR on the same cable, the
> > +subnet must have a `dhcp` flag. Logic must not allow more that one
> > +subnets of the same version in the same network to have dhcp enabled. To
> > +modify a subnet's name or dhcp flag one could use:
> > +
> > +::
> > +
> > +  gnt-network modify --subnet some-ident:modify,dhcp=false,name=foo
> network1
> > +
> > +This would search for a registered subnet that matches the identifier,
> > +disable DHCP on it and change its name. If ``dhcp=true`` is passed,
> > +logic will first check if another subnet of the same version has dhcp
> > +enabled.
>
> Could you please help me understand what 'subnet of the same version'
> means?  I am not familiar with this terminology.
>

I thought version here meant IP version, as in IPv4 vs IPv6. (Correct me,
Dimitris, if I am wrong.)


>
> Also, is it the case that the 'dhcp' parameter is meant only for
> validation purposes?  In other words, is Ganeti enabling DHCP here?
> If this parameter is only used for validation purposes, we have to be
> careful not to mislead people into thinking that Ganeti is actually
> starting a DHCP service.
>
> > +
> > +Changing the CIDR or the gateway of a subnet should also be supported.
> > +
> > +::
> > +
> > +  gnt-network modify --subnet some-ident:modify,cidr=192.0.2.0/22 net1
> > +  gnt-network modify --subnet some-ident:modify,cidr=192.0.2.32/28 net1
> > +  gnt-network modify --subnet some-ident:modify,gateway=192.0.2.40 net1
> > +
> > +Before expanding a subnet logic should should check for overlapping
> > +subnets. Shrinking the subnet should be allowed only if the ranges
> > +that are about to be trimmed are not included either in pool
> > +reservations or external ranges.
> > +
> > +ter is only used for validation purposes, we have to be
> careful not to mislead people into thinking that Ganeti is actually
> starting a DHCP service.
> > +Multiple IP pools per Subnet
> > +++++++++++++++++++++++++++++
> > +
> > +Currently IP pools are automatically created during network creation and
> > +include the whole subnet. Some IPs can be excluded from the pool by
> > +passing them explicitly with ``--add-reserved-ips`` option.
> > +
> > +Still for IPv6 subnets or even big IPv4 ones this might be insufficient.
> > +It is impossible to have two bitarrays for a /64 prefix. Even for IPv4
> > +networks a /20 subnet currently requires 8K long bitarrays. And the
> > +second 4K is practically useless since the external reservations are way
> > +less than the actual reservations.
> > +
> > +This design extract IP pool management from the network logic, and pools
> > +will become optional. Currently the pool is created based on the
> > +network's CIDR. With multiple subnets per network, we should be able to
> > +create and add IP pools to a network (and eventually to the
> > +corresponding subnet). Each pool will have an optional user friendly
> > +`name` so that the end user can refer to it (see instance related
> > +operations).
> > +
> > +The user will be able to obtain dynamically an IP only if we have
> > +already defined a pool for a network's subnet. One would use ``ip=pool``
> > +for the first available IP of the first available pool, or
> > +``ip=some-pool-name`` for the first available IP of a specific pool.
> > +
> > +Any pool related actions will be done via the new ``--pool`` option.
> > +
> > +In order to add a pool a relevant subnet should pre-exist. Overlapping
> > +pools won't be allowed. For example:
> > +
> > +::
> > +
> > +  gnt-network modify --pool add:192.0.2.10-192.0.2.100,name=pool1 net1
> > +  gnt-network modify --pool add:10.0.0.7-10.0.0.20 net1
> > +  gnt-network modify --pool add:10.0.0.100 net1
>
> This is very cool.  We can simplify things by making reserved IPs just
> another IP pool.  Some examples:
>
>   gnt-network modify --pool
> add:192.0.2.10-192.0.2.100,name=pool1,reserved=true net1
>   gnt-network modify --pool add:10.0.0.7-10.0.0.20,reserved=true net1
>   gnt-network modify --pool add:10.0.0.100,reserved=true net1
>
> This way we don't have to have 2 places internally to keep track of
> internal and external reservations.  We can't just reuse the same
> concepts and the same code.  Naturally, for the case of reserved IP
> pools we would not construct the bitarray.
>
> What do you think?  Do you see any problems with this?
> ter is only used for validation purposes, we have to be
> careful not to mislead people into thinking that Ganeti is actually
> starting a DHCP service.
> > +
> > +will first parse and find the ranges. Then for each range, Ganeti will
> > +try to find a matching subnet meaning that a pool must be a subrange of
> > +the subnet. If found, the range with empty reservations will be appended
> > +to the list of the subnet's pools. Moreover, logic must be added to
> > +reserve the IPs that are currently in use by instances of this network.
> > +
> > +During pool removal, logic should be added to split pools if ranges
> > +given overlap existing ones. For example:ter is only used for
> validation purposes, we have to be
> careful not to mislead people into thinking that Ganeti is actually
> starting a DHCP service.
> > +
> > +::
> > +
> > +  gnt-network modify --pool remove:192.0.2.20-192.0.2.50 net1
> > +
> > +will split the pool previously added (10-100) into two new ones;
> > +10-19 and 51-100. The corresponding bitarrays will be trimmed
> > +accordingly. The name will be preserved.
> > +
> > +The same things apply to external reservations. Just like now,
> > +modifications will take place via the ``--add|remove-reserved-ips``
> > +option. Logic must be added to support IP rangter is only used for
> validation purposes, we have to be
> careful not to mislead people into thinking that Ganeti is actually
> starting a DHCP service.es.
> > +
> > +Based on the aforementioned we propose the following changes:
> > +
> > +1) Change the IP pool representation in config data.
> > +ter is only used for validation purposes, we have to be
> careful not to mislead people into thinking that Ganeti is actually
> starting a DHCP service.
> > +  Existing `reservations` and `external_reservations` bitarrays will be
> > +  removed. Instead, for each subnet we will have:
> > +
> > +  * `pools`: List of (IP range, reservations bitarray) tuples.
> > +  * `external`: List of IP ranges
> > +
> > +  For external ranges the reservations bitarray is not needed
> > +  since this will be all 1's.
> > +
> > +2) Change the network module logic.
> > +
> > +  The above changes should be done in the network module and be
> transparent
> > +  to the rest of the Ganeti code. If a random IP from the networks is
> > +  requested, Ganeti searches for an available IP from the first pool of
> > +  the first subnet. If it is full it gets to the next pool. Then to the
> > +  next subnet and so on. Of course the `external` IP ranges will be
> > +  excluded. If an IP is explicitly requested, Ganeti will try to find a
> > +  matching subnet. Its pools and external will be checked for
> > +  availability. All this logic will be extracted in a separate class
> > +  with helper methods for easier manipulation of IP ranges and
> > +  bitarrays.
> > +
> > +3) Changes in config module.ter is only used for validation purposes,
> we have to be
> careful not to mislead people into thinking that Ganeti is actually
> starting a DHCP service.
> > +
> > +  We should not have instances with the same IP inside the same network.
> > +  We introduce _AllIPs() helper config method that will hold all
> existing
> > +  (IP, network) tuples. Config logic will check this list as well
> > +  before passing it to TemporaryReservationManager.
> > +
> > +4) Change the query mechanism.
> > +
> > +  Since we have more that one subnets the new `subnets` field will
> > +  include a list of:
> > +
> > +  * cidr: IPv4 or IPv6 CIDR
> > +  * gateway: IPv4 or IPv6 address
> > +  * dhcp: True or False
> > +  * name: The user friendly name for the subnet
> > +
> > +  Since we want to support small pools inside big subnets, current query
> > +  results are not practical as far as the `map` field is concerned. It
> > +  should be replaced with the new `pools` field for each subnet, which
> will
> > +  contain:
> > +
> > +  * start: The first IP of the pool
> > +  * end: The last IP of the pool
> > +  * map: A string with 'X' for reserved IPs (either external or not) and
> > +    with '.' for all available ones inside the pool
> > +
> > +
> > +
> > +Multiple IPs per NIC
> > +++++++++++++++++++++
> > +
> > +Currently IP is a simple string inside the NIC object and there is a
> > +one-to-one mapping between the `ip` and the `network` slots. The whole
> > +logic behind this is that a NIC belongs to a network (cable) and
> > +inherits its mode and link. This rational will not change.
> > +
> > +Since this design adds support for multiple subnets per network, a NIC
> > +must be able to obtain multiple IPs from various subnets of the same
> > +network network. Thus we change the `ip` slot into a list.
> > +
> > +During instance related operations it should be used something like:
> > +
> > +::
> > +
> > +  gnt-instance add --net
> 0:ip=192.0.2.4,ip=pool,ip=some-pool-name,network=network1 inst1
> > +
> > +
> > +This will be parsed, converted to a proper list (e.g. ip = [192.0.2.4,
> > +"pool", "some-pool-name"]) and finally passed to the corresponding
> opcode.
> > +Based on the previous example, here the first IP will match subnet1, the
> > +second IP will be retrieved from the first available pool of the first
> > +available subnet, and the third from the pool with the some-pool name.
> > +
> > +During instance modification, the `ip` option will refer to the first IP
> > +of the NIC, whereas the `ipx` will refer to the X'th IP.
> > +
> > +
> > +Configuration changes
> > +---------------------
> > +
> > +IPRange config object:
> > +  Introduce new config object that will hold ranges needed by pools, and
> > +  reservations. It will be either a tuple of (start, size, end) or a
> > +  simple sting. The `end` is redundant and can derive from start and
> > +  size in runtime, but will appear in the representation for readability
> > +  and debug reasons.
>
> This is good.  Internally we keep only (start, end) or (start, size) but
> when we print, we show the actual triple (start, size, end).  Cool!
>
> Cheers,
> Jose
>
> > +
> > +Pool config object:
> > +  Introduce new config object to represent a single subnet's pool. It
> > +  will have the `range`, `reservations`, `name` slots. The range slot
> > +  will be an IPRange config object, the reservations a bitarray and the
> > +  name a simple string.
> > +
> > +Subnet config object:
> > +  Introduce new config object with slots: `name`, `uuid`, `cidr`,
> > +  `gateway`, `dhcp`, `pools`, `external`. Pools is a list of Pool config
> > +  objects. External is a list of IPRange config objects. All ranges must
> > +  reside inside the subnet's CIDR. Only `cidr` will be mandatory. The
> > +  `dhcp` attribute will be False by default.
> > +
> > +Network config objects:
> > +  The L3 and the IP pool representation will change. Specifically all
> > +  slots besides `name`, `mac_prefix`, and `tag` will be removed. Instead
> > +  the slot `subnets` with a list of Subnet config objects will be added.
> > +
> > +NIC config objects:
> > +  NIC's network slot will be removed and the `ip` slot will be modified
> > +  to a list of strings.
> > +
> > +KVM runtime files:
> > +  Any change done in config data must be done also in KVM runtime files.
> > +  For this purpose the existing _UpgradeSerializedRuntime() can be used.
> > +
> > +
> > +Exported variables
> > +------------------
> > +
> > +The exported variables during instance related operations will be just
> > +like Linux uses aliases for interfaces. Specifically:
> > +
> > +``IP:i`` for the ith IP.
> > +
> > +``NETWORK_*:i`` for the ith subnet. * is SUBNET, GATEWAY, DHCP.
> > +
> > +In case of hooks those variables will be prefixed with ``INSTANCE_NICn``
> > +for the nth NIC.
> > +
> > +
> > +Backwards Compatibility
> > +-----------------------
> > +
> > +The existing networks representation will be internally modified.
> > +They will obtain one subnet, and one pool with range the whole subnet.
> > +
> > +During `gnt-network add` if the deprecated ``--network`` option is
> passed
> > +will still create a network with one subnet, and one IP pool with the
> > +size of the subnet. Otherwise ``--subnet`` and ``--pool`` options
> > +will be needed.
> > +
> > +The query mechanism will also include the deprecated `map` field. For
> the
> > +newly created network this will contain only the mapping of the first
> > +pool. The deprecated `network`, `gateway`, `network6`, `gateway6` fields
> > +will point to the first IPv4 and IPv6 subnet accordingly.
> > +
> > +During instance related operation the `ip` argument of the ``--net``
> > +option will refer to the first IP of the NIC.
> > +
> > +Hooks and scripts will still have the same environment exported in case
> > +of single IP per NIC.
> > +
> > +
> > +.. vim: set textwidth=72 :
> > +.. Local Variables:
> > +.. mode: rst
> > +.. fill-column: 72
> > +.. End:
> > --
> > 1.7.10.4
>
>
> Cheers,
Helga

Reply via email to