This design doc is still pending review. ultrotter first accepted the proposal from GrNet for this task, should he review it?
Thanks, Jose On May 15 13:40, Dimitris Aragiorgis wrote: > The ifdown script will be responsible for deconfiguring network > devices and cleanup changes made by the ifup script. The first > implementation will target KVM but it could be ported to Xen as well > especially when Xen hotplug gets implemented. > > Signed-off-by: Dimitris Aragiorgis <[email protected]> > --- > Makefile.am | 1 + > doc/design-draft.rst | 1 + > doc/design-ifdown.rst | 156 > +++++++++++++++++++++++++++++++++++++++++++++++++ > 3 files changed, 158 insertions(+) > create mode 100644 doc/design-ifdown.rst > > diff --git a/Makefile.am b/Makefile.am > index f5287f6..47b127d 100644 > --- a/Makefile.am > +++ b/Makefile.am > @@ -587,6 +587,7 @@ docinput = \ > doc/design-htools-2.3.rst \ > doc/design-http-server.rst \ > doc/design-hugepages-support.rst \ > + doc/design-ifdown.rst \ > doc/design-impexp2.rst \ > doc/design-internal-shutdown.rst \ > doc/design-kvmd.rst \ > diff --git a/doc/design-draft.rst b/doc/design-draft.rst > index e3531ee..f6a4e49 100644 > --- a/doc/design-draft.rst > +++ b/doc/design-draft.rst > @@ -24,6 +24,7 @@ Design document drafts > design-systemd.rst > design-cpu-speed.rst > design-performance-tests.rst > + design-ifdown.rst > > .. vim: set textwidth=72 : > .. Local Variables: > diff --git a/doc/design-ifdown.rst b/doc/design-ifdown.rst > new file mode 100644 > index 0000000..7626da9 > --- /dev/null > +++ b/doc/design-ifdown.rst > @@ -0,0 +1,156 @@ > +====================================== > +Design for adding ifdown script to KVM > +====================================== > + > +.. contents:: :depth: 4 > + > +This is a design document about adding support for an ifdown script > responsible > +for deconfiguring network devices and cleanup changes made by the ifup > script. The > +first implementation will target KVM but it could be ported to Xen as well > +especially when hotplug gets implemented. > + > + > +Current state and shortcomings > +============================== > + > +Currently, KVM before instance startup, instance migration and NIC hotplug, > it > +creates a tap and invokes explicitly the kvm-ifup script with the relevant > +environment (INTERFACE, MAC, IP, MODE, LINK, TAGS, and all the network info > if > +any; NETWORK\_SUBNET, NETWORK\_TAGS, etc). > + > +For Xen we have the `vif-ganeti` script (associated with vif-script > hypervisor > +parameter). The main difference is that Xen calls it by itself by passing it > as > +an extra option in the configuration file. > + > +This ifup script can do several things; bridge a tap to a bridge, add ip > rules, > +update a external DNS or DHCP server, enable proxy ARP or proxy NDP, issue > +openvswitch commands, etc. In general we can divide those actions in two > +categories: > + > +1) Commands that change the state of the host > +2) Commands that change the state of external components. > + > +Currently those changes do not get cleaned up or modified upon instance > +shutdown, remove, migrate, or NIC hot-unplug. Thus we have stale entries in > +hosts and most important might have stale/invalid configuration on external > +components like routers that could affect connectivity. > + > +A workaround could be hooks but: > + > +1) During migrate hooks the environment is the one held in config data > +and not in runtime files. The NIC configuration might have changed on > +master but not on the running KVM process (unless hotplug is used). > +Plus the NIC order in config data might not be the same one on the KVM > +process. > + > +2) On instance modification, changes are not available on hooks. With > +other words we do not know the configuration before and after modification. > + > +Since Ganeti is the orchestrator and is the one who explicitly configures > +host devices (tap, vif) it should be the one responsible for cleanup/ > +deconfiguration. Especially on a SDN approach this kind of script might > +be useful to cleanup flows in the cluster in order to ensure correct paths > +without ping pongs between hosts or connectivity loss for the instance. > + > + > +Proposed Changes > +================ > + > +We add an new script, kvm-ifdown that is explicitly invoked after: > + > +1) instance shutdown on primary node > +2) successful instance migration on source node > +3) failed instance migration on target node > +4) successful NIC hot-remove on primary node > + > +If an administrator's custom ifdown script exists (e.g. `kvm-ifdown-custom`), > +the `kvm-ifdown` script executes that script, as happens with `kvm-ifup`. > + > +Along with that change we should rename custom ifup script from > +`kvm-vif-bridge` (which does not make any sense) to `kvm-ifup-custom`. > + > +In contrary to `kvm-ifup`, one cannot rely on `kvm-ifdown` script to be > +called. A node might die just after a successful migration or after an > +instance shutdown. In that case, all "undo" operations will not be invoked. > +Thus, this script should work "on a best effort basis" and the network > +should not rely on the script being called or being successful. Additionally > +it should modify *only* the node local dynamic configs (routes, arp entries, > +SDN, firewalls, etc.), whereas static ones (DNS, DHCP, etc.) should be > modified > +via hooks. > + > + > +Implementation Details > +====================== > + > +1) Where to get the NIC info? > + > +We cannot account on config data since it might have changed. So the only > +place we keep our valid data is inside the runtime file. During instance > +modifications (NIC hot-remove, hot-modify) we have the NIC object from > +the RPC. We take its UUID and search for the corresponding entry in the > +runtime file to get further info. After instance shutdown and migration > +we just take all NICs from the runtime file and invoke the ifdown script > +for each one > + > +2) Where to find the corresponding TAP? > + > +Currently TAP names are kept under > +/var/run/ganeti/kvm-hypervisor/nics/<instance>/<nic\_index>. > +This is not enough. As told above a NIC's index might change during > instance's > +life. An example will make things clear: > + > +* The admin starts an instance with three NICs. > +* The admin removes the second without hotplug. > +* The admin removes the first with hotplug. > + > +The index that will arrive with the RPC will be 1 and if we read the relevant > +NIC file we will get the tap of the NIC that has been removed on second > +step but is still existing in the KVM process. > + > +So upon TAP creation we write another file with the same info but named > +after the NIC's UUID. The one named after its index can be left > +for compatibility (Ganeti does not use it; external tools might) > +Obviously this info will not be available for old instances in the cluster. > +The ifdown script should be aware of this corner case. > + > +3) What should we cleanup/deconfigure? > + > +Upon NIC hot-remove we obviously want to wipe everything. But on instance > +migration we don't want to reset external configuration like DNS. So we > choose > +to pass an extra positional argument to the ifdown script (it already has the > +TAP name) that will reflect the context it was invoked with. Please note that > +de-configuration of external components is not encouraged and should be > +done via hooks. Still we could easily support it via this extra argument. > + > +4) What will be the script environment? > + > +In general the same environment passed to ifup script. Except instance's > +tags. Those are the only info not kept in runtime file and it can > +change between ifup and ifdown script execution. The ifdown > +script must be aware of it and should cleanup everything that ifup script > +might setup depending on instance tags (e.g. firewalls, etc) > + > + > +Configuration Changes > +~~~~~~~~~~~~~~~~~~~~~ > + > +1) The `kvm-ifdown` script will be an extra file installed under the same dir > + `kvm-ifup` resides. We could have a single script (and symbolic links to > it) > + that shares the same code, where a second positional argument or an extra > + environment variable would define if we are bringing the interface up or > + down. Still this is not the best practice since it is not equivalent > + with how KVM uses `script` and `downscript` in the `netdev` option; > scripts > + are different files that get the tap name as positional argument. Of > course > + common code will go in `net-common` so that it can be sourced from either > + Xen or KVM specific scripts. > + > +2) An extra file written upon TAP creation named after the NIC's UUID and > + including the TAP's name. Since this should be the correct file to keep > + backwards compatibility we create a symbolic link named after the NIC's > + index and pointing to this new file. > + > +.. vim: set textwidth=72 : > +.. Local Variables: > +.. mode: rst > +.. fill-column: 72 > +.. End: > -- > 1.7.10.4 > -- Jose Antonio Lopes Ganeti Engineering Google Germany GmbH Dienerstr. 12, 80331, München Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg Geschäftsführer: Graham Law, Christine Elizabeth Flores Steuernummer: 48/725/00206 Umsatzsteueridentifikationsnummer: DE813741370
