Hi, since Jose is no longer in the team could anyone review this design doc?
Thanks in advance, dimara * Dimitris Aragiorgis <[email protected]> [2014-07-04 15:26:27 +0300]: > Hi! > > I think I broke any (negative) record as far as the "response latency" is > concerned! Really sorry about that. > > Lots of fixes/urgent things have gone in the way and made me postpone it every > time. > > Since you are working on relevant things right now (just show the IP > reservations made by Petr), I would like the design to be clear and not have > any conflicts with existing implementation. > > Lets resurrect this thread, shall we? > > * Jose A. Lopes <[email protected]> [2014-05-13 09:50:23 +0200]: > > .. 50+ days ago :) > > > On Apr 23 16:22, Dimitris Aragiorgis wrote: > > > This design doc describes how to extend the existing network > > > management and make it more flexible and able to deal with more > > > generic use cases. It proposes support for: > > > > > > - Networks with multiple subnets > > > - Subnets with multiple IP pools > > > - NICs with multiple IPs from various subnets of a single network > > > > > > Signed-off-by: Dimitris Aragiorgis <[email protected]> > > > --- > > > > > > Hello team, > > > > > > After our discussions during GanetiCon 2013 and a recent discussion with > > > Jose, I'm sending the revised design document for networks, incorporating > > > all your comments. > > > > > > Looking forward to your feedback, > > > dimara > > > > > > Makefile.am | 1 + > > > doc/design-draft.rst | 1 + > > > doc/design-network2.rst | 400 > > > +++++++++++++++++++++++++++++++++++++++++++++++ > > > 3 files changed, 402 insertions(+) > > > create mode 100644 doc/design-network2.rst > > > > > > diff --git a/Makefile.am b/Makefile.am > > > index f2589e6..140608f 100644 > > > --- a/Makefile.am > > > +++ b/Makefile.am > > > @@ -586,6 +586,7 @@ docinput = \ > > > doc/design-multi-reloc.rst \ > > > doc/design-multi-version-tests.rst \ > > > doc/design-network.rst \ > > > + doc/design-network2.rst \ > > > doc/design-node-add.rst \ > > > doc/design-oob.rst \ > > > doc/design-openvswitch.rst \ > > > diff --git a/doc/design-draft.rst b/doc/design-draft.rst > > > index 55bed7c..926f35b 100644 > > > --- a/doc/design-draft.rst > > > +++ b/doc/design-draft.rst > > > @@ -23,6 +23,7 @@ Design document drafts > > > design-node-security.rst > > > design-systemd.rst > > > design-cpu-speed.rst > > > + design-network2.rst > > > > > > .. vim: set textwidth=72 : > > > .. Local Variables: > > > diff --git a/doc/design-network2.rst b/doc/design-network2.rst > > > new file mode 100644 > > > index 0000000..84a44e8 > > > --- /dev/null > > > +++ b/doc/design-network2.rst > > > @@ -0,0 +1,400 @@ > > > +============================ > > > +Network Management (revised) > > > +============================ > > > + > > > +.. contents:: :depth: 4 > > > + > > > +This is a design document detailing how to extend the existing network > > > +management and make it more flexible and able to deal with more generic > > > +use cases. > > > + > > > + > > > +Current state and shortcomings > > > +------------------------------ > > > + > > > +Currently in Ganeti, networks are tightly connected with IP pools, > > > +since creation of a network implies the existence of one subnet > > > +and the corresponding IP pool. This design does not allow common > > > +scenarios like: > > > + > > > +- L2 only networks > > > +- IPv6 only networks > > > +- Networks without an IP pool > > > +- Networks with an IPv6 pool > > > +- Networks with multiple IP pools (alternative to externally reserving > > > + IPs) > > > + > > > +Additionally one cannot have multiple IP pools inside one network. > > > +Finally, from the instance perspective, a NIC cannot get more than one > > > +IPs (v4 and v6). > > > + > > > + > > > +Proposed changes > > > +---------------- > > > + > > > +In order to deal with the above shortcomings, we propose to extend > > > +the existing networks in Ganeti and support: > > > + > > > +a) Networks with multiple subnets > > > +b) Subnets with multiple IP pools > > > +c) NICs with multiple IPs from various subnets of a single network > > > + > > > +These changes bring up some design and implementation issues that we > > > +discuss in the following sections. > > > + > > > +Semantics > > > +++++++++++ > > > + > > > +Quoting the initial network management design doc "an IP pool consists > > > +of two bitarrays. Specifically the ``reservations`` bitarray which holds > > > +all IP addresses reserved by Ganeti instances and the ``external > > > +reservations`` bitarray with all IPs that are excluded from the IP pool > > > +and cannot be assigned automatically by Ganeti to instances (via > > > +ip=pool)". > > > + > > > +Without violating those semantics, here, we clarify the following > > > +definitions. > > > + > > > +**network**: A cluster level taggable configuration object with a > > > +user-provider name, (e.g. network1, network2), UUID and MAC prefix. > > > + > > > +**L2**: The `mode` and `link` with which we connect a network to a > > > +nodegroup. A NIC attached to a network will inherit this info, just like > > > +connecting an Ethernet cable to a physical NIC. In this sense we only > > > +have one L2 info per NIC. > > > + > > > +**L3**: A CIDR and a gateway related to the network. Since a NIC can > > > +have multiple IPs on the same cable each network can have multiple L3 > > > +info with the restriction that they do not overlap with each other. > > > > Hi, > > > > Great design document. Great job! > > > > Thanks. > > > I would like to ask a few things. > > > > Is the gateway optional? > > > > Yes. The gateway will be optional just like it currently is. The use case is > private networks that do not have a default route. > > > > + > > > +**subnet**: A subnet is the above L3 info plus some additional > > > information > > > +(see below). > > > + > > > +**ip**: A valid IP should reside in a network's subnet, and should not > > > +be used by more than one instance. An IP can be either obtained > > > dynamically > > > +from a pool or requested explicitly from a subnet (or a pool). > > > + > > > +**range**: Sequential IPs inside one subnet calculated either from the > > > +first IP and a size (e.g. start=192.0.2.10, size=10) or the first IP and > > > +the last IP (e.g. start=192.0.2.10, end=192.0.2.19). A single IP can > > > +also be thought of as an IP range with size=1 (see configuration > > > +changes). > > > + > > > +**reservations**: All IPs that are used by instances in the cluster at > > > +any time. > > > + > > > +**external reservations**: All IPs that are supposed to be reserved > > > +by the admin for either some external component or specific instances. > > > +If one instance requests an external IP explicitly (ip=192.0.2.100), > > > +Ganeti will allow the operation only if ``--force`` is given. Still, the > > > +admin can externally reserve an IP that is already in use by an > > > +instance, as happens now. This helps to reserve an IP for future use and > > > +at the same time prevent any possible race between the instance that > > > +releases this IP and another that tries to retrieve it. > > > + > > > +**pool**: A (range, reservations, name) tuple from which instances can > > > +dynamically obtain an IP. Reservations is a bitarray with > > > +length the size of the range, and is needed so that we know which IPs > > > +are available at any time without querying all instances. The use of > > > +name is explained below. A subnet can have multiple pools. > > > + > > > + > > > +Split L2 from L3 > > > +++++++++++++++++ > > > + > > > +Currently networks in Ganeti do not separate L2 from L3. This means > > > +that one cannot use L2 only networks. The reason is because the CIDR > > > +(passed currently with the ``--network`` option) and the derived IP pool > > > +are mandatory. This design makes L3 info optional. This way we can have > > > +an L2 only network just by connecting a Ganeti network to a nodegroup > > > +with the desired `mode` and `link`. Then one could add one or more > > > subnets > > > +to the existing network. > > > + > > > + > > > +Multiple Subnets per Network > > > +++++++++++++++++++++++++++++ > > > + > > > +Currently the IPv4 CIDR is mandatory for a network. Also a network can > > > +obtain at most one IPv4 CIDR and one IPv6 CIDR. These restrictions will > > > +be lifted. > > > + > > > +This design doc introduces support for multiple subnets per network. The > > > +L3 info will be moved inside the subnet. A subnet will have a `name` and > > > +a `uuid` just like NIC and Disk config objects. Additionally it will > > > contain > > > +the `dhcp` flag which is explained below, and the `pools` and `external` > > > +fields which are mentioned in the next section. Only the `cidr` will be > > > +mandatory. > > > + > > > +Any subnet related actions will be done via the new ``--subnet`` option. > > > +Its syntax will be similar to ``--net``. > > > + > > > +The network's subnets must not overlap with each other. Logic will > > > +validate any operations related to reserving/releasing of IPs and check > > > +whether a requested IP is included inside one of the network's subnets. > > > +Just like currently, the L3 info will be exported to NIC configuration > > > +hooks and scripts as environment variables. The example below adds > > > +subnets to a network: > > > + > > > +:: > > > + > > > + gnt-network modify --subnet > > > add:cidr=10.0.0.0/24,gateway=10.0.0.1,dhcp=true net1 > > > + gnt-network modify --subnet > > > add:cidr=2001::/64,gateway=2001::1,dhcp=true net1 > > > + > > > +To remove a subnet from a network one should use: > > > + > > > +:: > > > + > > > + gnt-network modify --subnet some-ident:remove network1 > > > + > > > +where some-ident can be either a CIDR, a name or a UUID. Ganeti will > > > +allow this operation only if no instances use IPs from this subnet. > > > + > > > +Since DHCP is allowed only for a single CIDR on the same cable, the > > > +subnet must have a `dhcp` flag. Logic must not allow more that one > > > +subnets of the same version in the same network to have dhcp enabled. To > > > +modify a subnet's name or dhcp flag one could use: > > > + > > > +:: > > > + > > > + gnt-network modify --subnet some-ident:modify,dhcp=false,name=foo > > > network1 > > > + > > > +This would search for a registered subnet that matches the identifier, > > > +disable DHCP on it and change its name. If ``dhcp=true`` is passed, > > > +logic will first check if another subnet of the same version has dhcp > > > +enabled. > > > > Could you please help me understand what 'subnet of the same version' > > means? I am not familiar with this terminology. > > > > I mean v4 and v6. As far as I know, we cannot have dhcp enabled on > multiple subnets. So we should have max one v4 subnet and one v6 subnet > with dhcp enabled. Maybe I should rephrase that, to be more clear. > > > Also, is it the case that the 'dhcp' parameter is meant only for > > validation purposes? In other words, is Ganeti enabling DHCP here? > > If this parameter is only used for validation purposes, we have to be > > careful not to mislead people into thinking that Ganeti is actually > > starting a DHCP service. > > > > This parameter is to be exported to ifup scripts and hooks. Ganeti will > just check the aforementioned constraint. And yes we have to make clear > that Ganeti has nothing to do with a DHCP service. I will add a line > here noting it. > > > > + > > > +Changing the CIDR or the gateway of a subnet should also be supported. > > > + > > > +:: > > > + > > > + gnt-network modify --subnet some-ident:modify,cidr=192.0.2.0/22 net1 > > > + gnt-network modify --subnet some-ident:modify,cidr=192.0.2.32/28 net1 > > > + gnt-network modify --subnet some-ident:modify,gateway=192.0.2.40 net1 > > > + > > > +Before expanding a subnet logic should should check for overlapping > > > +subnets. Shrinking the subnet should be allowed only if the ranges > > > +that are about to be trimmed are not included either in pool > > > +reservations or external ranges. > > > + > > > + > > > +Multiple IP pools per Subnet > > > +++++++++++++++++++++++++++++ > > > + > > > +Currently IP pools are automatically created during network creation and > > > +include the whole subnet. Some IPs can be excluded from the pool by > > > +passing them explicitly with ``--add-reserved-ips`` option. > > > + > > > +Still for IPv6 subnets or even big IPv4 ones this might be insufficient. > > > +It is impossible to have two bitarrays for a /64 prefix. Even for IPv4 > > > +networks a /20 subnet currently requires 8K long bitarrays. And the > > > +second 4K is practically useless since the external reservations are way > > > +less than the actual reservations. > > > + > > > +This design extract IP pool management from the network logic, and pools > > > +will become optional. Currently the pool is created based on the > > > +network's CIDR. With multiple subnets per network, we should be able to > > > +create and add IP pools to a network (and eventually to the > > > +corresponding subnet). Each pool will have an optional user friendly > > > +`name` so that the end user can refer to it (see instance related > > > +operations). > > > + > > > +The user will be able to obtain dynamically an IP only if we have > > > +already defined a pool for a network's subnet. One would use ``ip=pool`` > > > +for the first available IP of the first available pool, or > > > +``ip=some-pool-name`` for the first available IP of a specific pool. > > > + > > > +Any pool related actions will be done via the new ``--pool`` option. > > > + > > > +In order to add a pool a relevant subnet should pre-exist. Overlapping > > > +pools won't be allowed. For example: > > > + > > > +:: > > > + > > > + gnt-network modify --pool add:192.0.2.10-192.0.2.100,name=pool1 net1 > > > + gnt-network modify --pool add:10.0.0.7-10.0.0.20 net1 > > > + gnt-network modify --pool add:10.0.0.100 net1 > > > > This is very cool. We can simplify things by making reserved IPs just > > another IP pool. Some examples: > > > > gnt-network modify --pool > > add:192.0.2.10-192.0.2.100,name=pool1,reserved=true net1 > > gnt-network modify --pool add:10.0.0.7-10.0.0.20,reserved=true net1 > > gnt-network modify --pool add:10.0.0.100,reserved=true net1 > > > > This way we don't have to have 2 places internally to keep track of > > internal and external reservations. We can't just reuse the same > > concepts and the same code. Naturally, for the case of reserved IP > > pools we would not construct the bitarray. > > > > What do you think? Do you see any problems with this? > > > > > Well I am not very fond of the `reserved` attribute in the `--pool` > option. I would prefer to keep the old interface, i.e. > --add|remove-reserved-ips, and enhance it with IP range support. > In other words, the user interface would be: > > gnt-network modify --pool add:192.0.2.10-192.0.2.15,name=pool1 > --pool add:10.0.0.8/29,name=pool2 > --pool add:10.0.0.40-10.0.0.45,name=pool3 > --add-reserved-ips 192.0.2.15,10.0.0.8-10.0.0.15,10.2.4.5 > net1 > > This will create something like: > > net1 { > subnets [ > uuid1 { > name: subnet1 > cidr: 192.0.2.0/24 > pools: [ > {range:Range(192.0.2.10, 192.0.2.15), reservations: 00000, > name:pool1} > ] > reserved: [192.0.2.15] > } > uuid2 { > name: subnet2 > cidr: 10.0.0.0/24 > pools: [ > {range:10.0.0.8/29, reservations: 00000000, name:pool3} > {range:10.0.0.40-10.0.0.45, reservations: 000000, name:pool3} > ] > reserved: [Range(10.0.0.8, 10.0.0.15), 10.2.4.5] > } > ] > } > > Range(start, end) will be some json representation of an IPRange() > > This way I see the following advantages: > > 1) Keep the existing semantics for pools and external reservations > 2) Each list has similar entries: one has pools the other has ranges. > The pool must have a bitarray, and has an optional name. > It is meaningless to add a name and a bitarray (as you said) to > external ranges. > 3) Each list must not have overlapping ranges. Still external > reservations can overlap with pools. > 4) The --pool option supports add|remove|modify command just like > `--net` and `--disk` and operate on single entities (a restriction > that is not needed for external reservations). Plus the modify > command is meaningless with reserved=true. > 5) Another thing, and probably the most important, is that in order to > get the first available IP, only the reserved list must be checked > for conflicts. The ipaddr.summarize_address_range(first, last) could > be very helpful. > > > If everything was under pools, like you say, for any operation (ip > reservation, pool creation, pool removal, etc) we would have to parse > the whole list and add logic (several if's) to separate between > actual pools or external ranges. > > Moreover --add|remove-reserved-ips should still exist for backwards > compatibility. > > As far as the code reuse you mention, I have in mind an IpRange class > that will implement basic validation and check methods, while the > Pool class will extend it with bitarray arithmetic. Helper methods > will act on lists of either Pools of IpRanges. Additionally we should > add logic to split ranges (and the corresponding pools). > > So, what do you think? > > > > + > > > +will first parse and find the ranges. Then for each range, Ganeti will > > > +try to find a matching subnet meaning that a pool must be a subrange of > > > +the subnet. If found, the range with empty reservations will be appended > > > +to the list of the subnet's pools. Moreover, logic must be added to > > > +reserve the IPs that are currently in use by instances of this network. > > > + > > > +During pool removal, logic should be added to split pools if ranges > > > +given overlap existing ones. For example: > > > + > > > +:: > > > + > > > + gnt-network modify --pool remove:192.0.2.20-192.0.2.50 net1 > > > + > > > +will split the pool previously added (10-100) into two new ones; > > > +10-19 and 51-100. The corresponding bitarrays will be trimmed > > > +accordingly. The name will be preserved. > > > + > > > +The same things apply to external reservations. Just like now, > > > +modifications will take place via the ``--add|remove-reserved-ips`` > > > +option. Logic must be added to support IP ranges. > > > + > > > +Based on the aforementioned we propose the following changes: > > > + > > > +1) Change the IP pool representation in config data. > > > + > > > + Existing `reservations` and `external_reservations` bitarrays will be > > > + removed. Instead, for each subnet we will have: > > > + > > > + * `pools`: List of (IP range, reservations bitarray) tuples. > > > + * `external`: List of IP ranges > > > + > > > + For external ranges the reservations bitarray is not needed > > > + since this will be all 1's. > > > + > > > +2) Change the network module logic. > > > + > > > + The above changes should be done in the network module and be > > > transparent > > > + to the rest of the Ganeti code. If a random IP from the networks is > > > + requested, Ganeti searches for an available IP from the first pool of > > > + the first subnet. If it is full it gets to the next pool. Then to the > > > + next subnet and so on. Of course the `external` IP ranges will be > > > + excluded. If an IP is explicitly requested, Ganeti will try to find a > > > + matching subnet. Its pools and external will be checked for > > > + availability. All this logic will be extracted in a separate class > > > + with helper methods for easier manipulation of IP ranges and > > > + bitarrays. > > > + > > > +3) Changes in config module. > > > + > > > + We should not have instances with the same IP inside the same network. > > > + We introduce _AllIPs() helper config method that will hold all existing > > > + (IP, network) tuples. Config logic will check this list as well > > > + before passing it to TemporaryReservationManager. > > > + > > > +4) Change the query mechanism. > > > + > > > + Since we have more that one subnets the new `subnets` field will > > > + include a list of: > > > + > > > + * cidr: IPv4 or IPv6 CIDR > > > + * gateway: IPv4 or IPv6 address > > > + * dhcp: True or False > > > + * name: The user friendly name for the subnet > > > + > > > + Since we want to support small pools inside big subnets, current query > > > + results are not practical as far as the `map` field is concerned. It > > > + should be replaced with the new `pools` field for each subnet, which > > > will > > > + contain: > > > + > > > + * start: The first IP of the pool > > > + * end: The last IP of the pool > > > + * map: A string with 'X' for reserved IPs (either external or not) and > > > + with '.' for all available ones inside the pool > > > + > > > + > > > + > > > +Multiple IPs per NIC > > > +++++++++++++++++++++ > > > + > > > +Currently IP is a simple string inside the NIC object and there is a > > > +one-to-one mapping between the `ip` and the `network` slots. The whole > > > +logic behind this is that a NIC belongs to a network (cable) and > > > +inherits its mode and link. This rational will not change. > > > + > > > +Since this design adds support for multiple subnets per network, a NIC > > > +must be able to obtain multiple IPs from various subnets of the same > > > +network network. Thus we change the `ip` slot into a list. > > > + > > > +During instance related operations it should be used something like: > > > + > > > +:: > > > + > > > + gnt-instance add --net > > > 0:ip=192.0.2.4,ip=pool,ip=some-pool-name,network=network1 inst1 > > > + > > > + > > > +This will be parsed, converted to a proper list (e.g. ip = [192.0.2.4, > > > +"pool", "some-pool-name"]) and finally passed to the corresponding > > > opcode. > > > +Based on the previous example, here the first IP will match subnet1, the > > > +second IP will be retrieved from the first available pool of the first > > > +available subnet, and the third from the pool with the some-pool name. > > > + > > > +During instance modification, the `ip` option will refer to the first IP > > > +of the NIC, whereas the `ipx` will refer to the X'th IP. > > > + > > > + > > > +Configuration changes > > > +--------------------- > > > + > > > +IPRange config object: > > > + Introduce new config object that will hold ranges needed by pools, and > > > + reservations. It will be either a tuple of (start, size, end) or a > > > + simple sting. The `end` is redundant and can derive from start and > > > + size in runtime, but will appear in the representation for readability > > > + and debug reasons. > > > > This is good. Internally we keep only (start, end) or (start, size) but > > when we print, we show the actual triple (start, size, end). Cool! > > > > Good. So as soon as we agree on everything, I will send an interdiff (or maybe > the whole design doc) with all the changes discussed in the thread so that we > can merge the design doc into master, right? > > Again sorry for the late response. > > Cheers, > dimara > > > Cheers, > > Jose > > > > > + > > > +Pool config object: > > > + Introduce new config object to represent a single subnet's pool. It > > > + will have the `range`, `reservations`, `name` slots. The range slot > > > + will be an IPRange config object, the reservations a bitarray and the > > > + name a simple string. > > > + > > > +Subnet config object: > > > + Introduce new config object with slots: `name`, `uuid`, `cidr`, > > > + `gateway`, `dhcp`, `pools`, `external`. Pools is a list of Pool config > > > + objects. External is a list of IPRange config objects. All ranges must > > > + reside inside the subnet's CIDR. Only `cidr` will be mandatory. The > > > + `dhcp` attribute will be False by default. > > > + > > > +Network config objects: > > > + The L3 and the IP pool representation will change. Specifically all > > > + slots besides `name`, `mac_prefix`, and `tag` will be removed. Instead > > > + the slot `subnets` with a list of Subnet config objects will be added. > > > + > > > +NIC config objects: > > > + NIC's network slot will be removed and the `ip` slot will be modified > > > + to a list of strings. > > > + > > > +KVM runtime files: > > > + Any change done in config data must be done also in KVM runtime files. > > > + For this purpose the existing _UpgradeSerializedRuntime() can be used. > > > + > > > + > > > +Exported variables > > > +------------------ > > > + > > > +The exported variables during instance related operations will be just > > > +like Linux uses aliases for interfaces. Specifically: > > > + > > > +``IP:i`` for the ith IP. > > > + > > > +``NETWORK_*:i`` for the ith subnet. * is SUBNET, GATEWAY, DHCP. > > > + > > > +In case of hooks those variables will be prefixed with ``INSTANCE_NICn`` > > > +for the nth NIC. > > > + > > > + > > > +Backwards Compatibility > > > +----------------------- > > > + > > > +The existing networks representation will be internally modified. > > > +They will obtain one subnet, and one pool with range the whole subnet. > > > + > > > +During `gnt-network add` if the deprecated ``--network`` option is passed > > > +will still create a network with one subnet, and one IP pool with the > > > +size of the subnet. Otherwise ``--subnet`` and ``--pool`` options > > > +will be needed. > > > + > > > +The query mechanism will also include the deprecated `map` field. For the > > > +newly created network this will contain only the mapping of the first > > > +pool. The deprecated `network`, `gateway`, `network6`, `gateway6` fields > > > +will point to the first IPv4 and IPv6 subnet accordingly. > > > + > > > +During instance related operation the `ip` argument of the ``--net`` > > > +option will refer to the first IP of the NIC. > > > + > > > +Hooks and scripts will still have the same environment exported in case > > > +of single IP per NIC. > > > + > > > + > > > +.. vim: set textwidth=72 : > > > +.. Local Variables: > > > +.. mode: rst > > > +.. fill-column: 72 > > > +.. End: > > > -- > > > 1.7.10.4 > > > > > > > > -- > > Jose Antonio Lopes > > Ganeti Engineering > > Google Germany GmbH > > Dienerstr. 12, 80331, München > > > > Registergericht und -nummer: Hamburg, HRB 86891 > > Sitz der Gesellschaft: Hamburg > > Geschäftsführer: Graham Law, Christine Elizabeth Flores > > Steuernummer: 48/725/00206 > > Umsatzsteueridentifikationsnummer: DE813741370
signature.asc
Description: Digital signature
