On Tue, May 13, 2014 at 9:50 AM, 'Jose A. Lopes' via ganeti-devel < [email protected]> wrote:
> On Apr 23 16:22, Dimitris Aragiorgis wrote: > > This design doc describes how to extend the existing network > > management and make it more flexible and able to deal with more > > generic use cases. It proposes support for: > > > > - Networks with multiple subnets > > - Subnets with multiple IP pools > > - NICs with multiple IPs from various subnets of a single network > > > > Signed-off-by: Dimitris Aragiorgis <[email protected]> > > --- > > > > Hello team, > > > > After our discussions during GanetiCon 2013 and a recent discussion with > > Jose, I'm sending the revised design document for networks, incorporating > > all your comments. > > > > Looking forward to your feedback, > > dimara > > > > Makefile.am | 1 + > > doc/design-draft.rst | 1 + > > doc/design-network2.rst | 400 > +++++++++++++++++++++++++++++++++++++++++++++++ > > 3 files changed, 402 insertions(+) > > create mode 100644 doc/design-network2.rst > > > > diff --git a/Makefile.am b/Makefile.am > > index f2589e6..140608f 100644 > > --- a/Makefile.am > > +++ b/Makefile.am > > @@ -586,6 +586,7 @@ docinput = \ > > doc/design-multi-reloc.rst \ > > doc/design-multi-version-tests.rst \ > > doc/design-network.rst \ > > + doc/design-network2.rst \ > > doc/design-node-add.rst \ > > doc/design-oob.rst \ > > doc/design-openvswitch.rst \ > > diff --git a/doc/design-draft.rst b/doc/design-draft.rst > > index 55bed7c..926f35b 100644 > > --- a/doc/design-draft.rst > > +++ b/doc/design-draft.rst > > @@ -23,6 +23,7 @@ Design document drafts > > design-node-security.rst > > design-systemd.rst > > design-cpu-speed.rst > > + design-network2.rst > > > > .. vim: set textwidth=72 : > > .. Local Variables: > > diff --git a/doc/design-network2.rst b/doc/design-network2.rst > > new file mode 100644 > > index 0000000..84a44e8 > > --- /dev/null > > +++ b/doc/design-network2.rst > > @@ -0,0 +1,400 @@ > > +============================ > > +Network Management (revised) > > +============================ > > + > > +.. contents:: :depth: 4 > > + > > +This is a design document detailing how to extend the existing network > > +management and make it more flexible and able to deal with more generic > > +use cases. > > + > > + > > +Current state and shortcomings > > +------------------------------ > > + > > +Currently in Ganeti, networks are tightly connected with IP pools, > > +since creation of a network implies the existence of one subnet > > +and the corresponding IP pool. This design does not allow common > > +scenarios like: > > + > > +- L2 only networks > > +- IPv6 only networks > > +- Networks without an IP pool > > +- Networks with an IPv6 pool > > +- Networks with multiple IP pools (alternative to externally reserving > > + IPs) > > + > > +Additionally one cannot have multiple IP pools inside one network. > > +Finally, from the instance perspective, a NIC cannot get more than one > > +IPs (v4 and v6). > > + > > + > > +Proposed changes > > +---------------- > > + > > +In order to deal with the above shortcomings, we propose to extend > > +the existing networks in Ganeti and support: > > + > > +a) Networks with multiple subnets > > +b) Subnets with multiple IP pools > > +c) NICs with multiple IPs from various subnets of a single network > > + > > +These changes bring up some design and implementation issues that we > > +discuss in the following sections. > > + > > +Semantics > > +++++++++++ > > + > > +Quoting the initial network management design doc "an IP pool consists > > +of two bitarrays. Specifically the ``reservations`` bitarray which holds > > +all IP addresses reserved by Ganeti instances and the ``external > > +reservations`` bitarray with all IPs that are excluded from the IP pool > > +and cannot be assigned automatically by Ganeti to instances (via > > +ip=pool)". > > + > > +Without violating those semantics, here, we clarify the following > > +definitions. > > + > > +**network**: A cluster level taggable configuration object with a > > +user-provider name, (e.g. network1, network2), UUID and MAC prefix. > > + > > +**L2**: The `mode` and `link` with which we connect a network to a > > +nodegroup. A NIC attached to a network will inherit this info, just like > > +connecting an Ethernet cable to a physical NIC. In this sense we only > > +have one L2 info per NIC. > > + > > +**L3**: A CIDR and a gateway related to the network. Since a NIC can > > +have multiple IPs on the same cable each network can have multiple L3 > > +info with the restriction that they do not overlap with each other. > > Hi, > > Great design document. Great job! > > I would like to ask a few things. > > Is the gateway optional? > > > + > > +**subnet**: A subnet is the above L3 info plus some additional > information > > +(see below). > > + > > +**ip**: A valid IP should reside in a network's subnet, and should not > > +be used by more than one instance. An IP can be either obtained > dynamically > > +from a pool or requested explicitly from a subnet (or a pool). > > + > > +**range**: Sequential IPs inside one subnet calculated either from the > > +first IP and a size (e.g. start=192.0.2.10, size=10) or the first IP and > > +the last IP (e.g. start=192.0.2.10, end=192.0.2.19). A single IP can > > +also be thought of as an IP range with size=1 (see configuration > > +changes). > > + > > +**reservations**: All IPs that are used by instances in the cluster at > > +any time. > > + > > +**external reservations**: All IPs that are supposed to be reserved > > +by the admin for either some external component or specific instances. > > +If one instance requests an external IP explicitly (ip=192.0.2.100), > > +Ganeti will allow the operation only if ``--force`` is given. Still, the > > +admin can externally reserve an IP that is already in use by an > > +instance, as happens now. This helps to reserve an IP for future use and > > +at the same time prevent any possible race between the instance that > > +releases this IP and another that tries to retrieve it. > > + > > +**pool**: A (range, reservations, name) tuple from which instances can > > +dynamically obtain an IP. Reservations is a bitarray with > > +length the size of the range, and is needed so that we know which IPs > > +are available at any time without querying all instances. The use of > > +name is explained below. A subnet can have multiple pools. > > + > > + > > +Split L2 from L3 > > +++++++++++++++++ > > + > > +Currently networks in Ganeti do not separate L2 from L3. This means > > +that one cannot use L2 only networks. The reason is because the CIDR > > +(passed currently with the ``--network`` option) and the derived IP pool > > +are mandatory. This design makes L3 info optional. This way we can have > > +an L2 only network just by connecting a Ganeti network to a nodegroup > > +with the desired `mode` and `link`. Then one could add one or more > subnets > > +to the existing network. > > + > > + > > +Multiple Subnets per Network > > +++++++++++++++++++++++++++++ > > + > > +Currently the IPv4 CIDR is mandatory for a network. Also a network can > > +obtain at most one IPv4 CIDR and one IPv6 CIDR. These restrictions will > > +be lifted. > > + > > +This design doc introduces support for multiple subnets per network. The > > +L3 info will be moved inside the subnet. A subnet will have a `name` and > > +a `uuid` just like NIC and Disk config objects. Additionally it will > contain > > +the `dhcp` flag which is explained below, and the `pools` and `external` > > +fields which are mentioned in the next section. Only the `cidr` will be > > +mandatory. > > + > > +Any subnet related actions will be done via the new ``--subnet`` option. > > +Its syntax will be similar to ``--net``. > > + > > +The network's subnets must not overlap with each other. Logic will > > +validate any operations related to reserving/releasing of IPs and check > > +whether a requested IP is included inside one of the network's subnets. > > +Just like currently, the L3 info will be exported to NIC configuration > > +hooks and scripts as environment variables. The example below adds > > +subnets to a network: > > + > > +:: > > + > > + gnt-network modify --subnet add:cidr= > 10.0.0.0/24,gateway=10.0.0.1,dhcp=true net1 > > + gnt-network modify --subnet > add:cidr=2001::/64,gateway=2001::1,dhcp=true net1 > > + > > +To remove a subnet from a network one should use: > > + > > +:: > > + > > + gnt-network modify --subnet some-ident:remove network1 > > + > > +where some-ident can be either a CIDR, a name or a UUID. Ganeti will > > +allow this operation only if no instances use IPs from this subnet. > > + > > +Since DHCP is allowed only for a single CIDR on the same cable, the > > +subnet must have a `dhcp` flag. Logic must not allow more that one > > +subnets of the same version in the same network to have dhcp enabled. To > > +modify a subnet's name or dhcp flag one could use: > > + > > +:: > > + > > + gnt-network modify --subnet some-ident:modify,dhcp=false,name=foo > network1 > > + > > +This would search for a registered subnet that matches the identifier, > > +disable DHCP on it and change its name. If ``dhcp=true`` is passed, > > +logic will first check if another subnet of the same version has dhcp > > +enabled. > > Could you please help me understand what 'subnet of the same version' > means? I am not familiar with this terminology. > I thought version here meant IP version, as in IPv4 vs IPv6. (Correct me, Dimitris, if I am wrong.) > > Also, is it the case that the 'dhcp' parameter is meant only for > validation purposes? In other words, is Ganeti enabling DHCP here? > If this parameter is only used for validation purposes, we have to be > careful not to mislead people into thinking that Ganeti is actually > starting a DHCP service. > > > + > > +Changing the CIDR or the gateway of a subnet should also be supported. > > + > > +:: > > + > > + gnt-network modify --subnet some-ident:modify,cidr=192.0.2.0/22 net1 > > + gnt-network modify --subnet some-ident:modify,cidr=192.0.2.32/28 net1 > > + gnt-network modify --subnet some-ident:modify,gateway=192.0.2.40 net1 > > + > > +Before expanding a subnet logic should should check for overlapping > > +subnets. Shrinking the subnet should be allowed only if the ranges > > +that are about to be trimmed are not included either in pool > > +reservations or external ranges. > > + > > +ter is only used for validation purposes, we have to be > careful not to mislead people into thinking that Ganeti is actually > starting a DHCP service. > > +Multiple IP pools per Subnet > > +++++++++++++++++++++++++++++ > > + > > +Currently IP pools are automatically created during network creation and > > +include the whole subnet. Some IPs can be excluded from the pool by > > +passing them explicitly with ``--add-reserved-ips`` option. > > + > > +Still for IPv6 subnets or even big IPv4 ones this might be insufficient. > > +It is impossible to have two bitarrays for a /64 prefix. Even for IPv4 > > +networks a /20 subnet currently requires 8K long bitarrays. And the > > +second 4K is practically useless since the external reservations are way > > +less than the actual reservations. > > + > > +This design extract IP pool management from the network logic, and pools > > +will become optional. Currently the pool is created based on the > > +network's CIDR. With multiple subnets per network, we should be able to > > +create and add IP pools to a network (and eventually to the > > +corresponding subnet). Each pool will have an optional user friendly > > +`name` so that the end user can refer to it (see instance related > > +operations). > > + > > +The user will be able to obtain dynamically an IP only if we have > > +already defined a pool for a network's subnet. One would use ``ip=pool`` > > +for the first available IP of the first available pool, or > > +``ip=some-pool-name`` for the first available IP of a specific pool. > > + > > +Any pool related actions will be done via the new ``--pool`` option. > > + > > +In order to add a pool a relevant subnet should pre-exist. Overlapping > > +pools won't be allowed. For example: > > + > > +:: > > + > > + gnt-network modify --pool add:192.0.2.10-192.0.2.100,name=pool1 net1 > > + gnt-network modify --pool add:10.0.0.7-10.0.0.20 net1 > > + gnt-network modify --pool add:10.0.0.100 net1 > > This is very cool. We can simplify things by making reserved IPs just > another IP pool. Some examples: > > gnt-network modify --pool > add:192.0.2.10-192.0.2.100,name=pool1,reserved=true net1 > gnt-network modify --pool add:10.0.0.7-10.0.0.20,reserved=true net1 > gnt-network modify --pool add:10.0.0.100,reserved=true net1 > > This way we don't have to have 2 places internally to keep track of > internal and external reservations. We can't just reuse the same > concepts and the same code. Naturally, for the case of reserved IP > pools we would not construct the bitarray. > > What do you think? Do you see any problems with this? > ter is only used for validation purposes, we have to be > careful not to mislead people into thinking that Ganeti is actually > starting a DHCP service. > > + > > +will first parse and find the ranges. Then for each range, Ganeti will > > +try to find a matching subnet meaning that a pool must be a subrange of > > +the subnet. If found, the range with empty reservations will be appended > > +to the list of the subnet's pools. Moreover, logic must be added to > > +reserve the IPs that are currently in use by instances of this network. > > + > > +During pool removal, logic should be added to split pools if ranges > > +given overlap existing ones. For example:ter is only used for > validation purposes, we have to be > careful not to mislead people into thinking that Ganeti is actually > starting a DHCP service. > > + > > +:: > > + > > + gnt-network modify --pool remove:192.0.2.20-192.0.2.50 net1 > > + > > +will split the pool previously added (10-100) into two new ones; > > +10-19 and 51-100. The corresponding bitarrays will be trimmed > > +accordingly. The name will be preserved. > > + > > +The same things apply to external reservations. Just like now, > > +modifications will take place via the ``--add|remove-reserved-ips`` > > +option. Logic must be added to support IP rangter is only used for > validation purposes, we have to be > careful not to mislead people into thinking that Ganeti is actually > starting a DHCP service.es. > > + > > +Based on the aforementioned we propose the following changes: > > + > > +1) Change the IP pool representation in config data. > > +ter is only used for validation purposes, we have to be > careful not to mislead people into thinking that Ganeti is actually > starting a DHCP service. > > + Existing `reservations` and `external_reservations` bitarrays will be > > + removed. Instead, for each subnet we will have: > > + > > + * `pools`: List of (IP range, reservations bitarray) tuples. > > + * `external`: List of IP ranges > > + > > + For external ranges the reservations bitarray is not needed > > + since this will be all 1's. > > + > > +2) Change the network module logic. > > + > > + The above changes should be done in the network module and be > transparent > > + to the rest of the Ganeti code. If a random IP from the networks is > > + requested, Ganeti searches for an available IP from the first pool of > > + the first subnet. If it is full it gets to the next pool. Then to the > > + next subnet and so on. Of course the `external` IP ranges will be > > + excluded. If an IP is explicitly requested, Ganeti will try to find a > > + matching subnet. Its pools and external will be checked for > > + availability. All this logic will be extracted in a separate class > > + with helper methods for easier manipulation of IP ranges and > > + bitarrays. > > + > > +3) Changes in config module.ter is only used for validation purposes, > we have to be > careful not to mislead people into thinking that Ganeti is actually > starting a DHCP service. > > + > > + We should not have instances with the same IP inside the same network. > > + We introduce _AllIPs() helper config method that will hold all > existing > > + (IP, network) tuples. Config logic will check this list as well > > + before passing it to TemporaryReservationManager. > > + > > +4) Change the query mechanism. > > + > > + Since we have more that one subnets the new `subnets` field will > > + include a list of: > > + > > + * cidr: IPv4 or IPv6 CIDR > > + * gateway: IPv4 or IPv6 address > > + * dhcp: True or False > > + * name: The user friendly name for the subnet > > + > > + Since we want to support small pools inside big subnets, current query > > + results are not practical as far as the `map` field is concerned. It > > + should be replaced with the new `pools` field for each subnet, which > will > > + contain: > > + > > + * start: The first IP of the pool > > + * end: The last IP of the pool > > + * map: A string with 'X' for reserved IPs (either external or not) and > > + with '.' for all available ones inside the pool > > + > > + > > + > > +Multiple IPs per NIC > > +++++++++++++++++++++ > > + > > +Currently IP is a simple string inside the NIC object and there is a > > +one-to-one mapping between the `ip` and the `network` slots. The whole > > +logic behind this is that a NIC belongs to a network (cable) and > > +inherits its mode and link. This rational will not change. > > + > > +Since this design adds support for multiple subnets per network, a NIC > > +must be able to obtain multiple IPs from various subnets of the same > > +network network. Thus we change the `ip` slot into a list. > > + > > +During instance related operations it should be used something like: > > + > > +:: > > + > > + gnt-instance add --net > 0:ip=192.0.2.4,ip=pool,ip=some-pool-name,network=network1 inst1 > > + > > + > > +This will be parsed, converted to a proper list (e.g. ip = [192.0.2.4, > > +"pool", "some-pool-name"]) and finally passed to the corresponding > opcode. > > +Based on the previous example, here the first IP will match subnet1, the > > +second IP will be retrieved from the first available pool of the first > > +available subnet, and the third from the pool with the some-pool name. > > + > > +During instance modification, the `ip` option will refer to the first IP > > +of the NIC, whereas the `ipx` will refer to the X'th IP. > > + > > + > > +Configuration changes > > +--------------------- > > + > > +IPRange config object: > > + Introduce new config object that will hold ranges needed by pools, and > > + reservations. It will be either a tuple of (start, size, end) or a > > + simple sting. The `end` is redundant and can derive from start and > > + size in runtime, but will appear in the representation for readability > > + and debug reasons. > > This is good. Internally we keep only (start, end) or (start, size) but > when we print, we show the actual triple (start, size, end). Cool! > > Cheers, > Jose > > > + > > +Pool config object: > > + Introduce new config object to represent a single subnet's pool. It > > + will have the `range`, `reservations`, `name` slots. The range slot > > + will be an IPRange config object, the reservations a bitarray and the > > + name a simple string. > > + > > +Subnet config object: > > + Introduce new config object with slots: `name`, `uuid`, `cidr`, > > + `gateway`, `dhcp`, `pools`, `external`. Pools is a list of Pool config > > + objects. External is a list of IPRange config objects. All ranges must > > + reside inside the subnet's CIDR. Only `cidr` will be mandatory. The > > + `dhcp` attribute will be False by default. > > + > > +Network config objects: > > + The L3 and the IP pool representation will change. Specifically all > > + slots besides `name`, `mac_prefix`, and `tag` will be removed. Instead > > + the slot `subnets` with a list of Subnet config objects will be added. > > + > > +NIC config objects: > > + NIC's network slot will be removed and the `ip` slot will be modified > > + to a list of strings. > > + > > +KVM runtime files: > > + Any change done in config data must be done also in KVM runtime files. > > + For this purpose the existing _UpgradeSerializedRuntime() can be used. > > + > > + > > +Exported variables > > +------------------ > > + > > +The exported variables during instance related operations will be just > > +like Linux uses aliases for interfaces. Specifically: > > + > > +``IP:i`` for the ith IP. > > + > > +``NETWORK_*:i`` for the ith subnet. * is SUBNET, GATEWAY, DHCP. > > + > > +In case of hooks those variables will be prefixed with ``INSTANCE_NICn`` > > +for the nth NIC. > > + > > + > > +Backwards Compatibility > > +----------------------- > > + > > +The existing networks representation will be internally modified. > > +They will obtain one subnet, and one pool with range the whole subnet. > > + > > +During `gnt-network add` if the deprecated ``--network`` option is > passed > > +will still create a network with one subnet, and one IP pool with the > > +size of the subnet. Otherwise ``--subnet`` and ``--pool`` options > > +will be needed. > > + > > +The query mechanism will also include the deprecated `map` field. For > the > > +newly created network this will contain only the mapping of the first > > +pool. The deprecated `network`, `gateway`, `network6`, `gateway6` fields > > +will point to the first IPv4 and IPv6 subnet accordingly. > > + > > +During instance related operation the `ip` argument of the ``--net`` > > +option will refer to the first IP of the NIC. > > + > > +Hooks and scripts will still have the same environment exported in case > > +of single IP per NIC. > > + > > + > > +.. vim: set textwidth=72 : > > +.. Local Variables: > > +.. mode: rst > > +.. fill-column: 72 > > +.. End: > > -- > > 1.7.10.4 > > > Cheers, Helga
