This design doc is still pending review.

ultrotter first accepted the proposal from GrNet for this task, should
he review it?

Thanks,
Jose

On May 15 13:40, Dimitris Aragiorgis wrote:
> The ifdown script will be responsible for deconfiguring network
> devices and cleanup changes made by the ifup script. The first
> implementation will target KVM but it could be ported to Xen as well
> especially when Xen hotplug gets implemented.
> 
> Signed-off-by: Dimitris Aragiorgis <[email protected]>
> ---
>  Makefile.am           |    1 +
>  doc/design-draft.rst  |    1 +
>  doc/design-ifdown.rst |  156 
> +++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 158 insertions(+)
>  create mode 100644 doc/design-ifdown.rst
> 
> diff --git a/Makefile.am b/Makefile.am
> index f5287f6..47b127d 100644
> --- a/Makefile.am
> +++ b/Makefile.am
> @@ -587,6 +587,7 @@ docinput = \
>       doc/design-htools-2.3.rst \
>       doc/design-http-server.rst \
>       doc/design-hugepages-support.rst \
> +     doc/design-ifdown.rst \
>       doc/design-impexp2.rst \
>       doc/design-internal-shutdown.rst \
>       doc/design-kvmd.rst \
> diff --git a/doc/design-draft.rst b/doc/design-draft.rst
> index e3531ee..f6a4e49 100644
> --- a/doc/design-draft.rst
> +++ b/doc/design-draft.rst
> @@ -24,6 +24,7 @@ Design document drafts
>     design-systemd.rst
>     design-cpu-speed.rst
>     design-performance-tests.rst
> +   design-ifdown.rst
>  
>  .. vim: set textwidth=72 :
>  .. Local Variables:
> diff --git a/doc/design-ifdown.rst b/doc/design-ifdown.rst
> new file mode 100644
> index 0000000..7626da9
> --- /dev/null
> +++ b/doc/design-ifdown.rst
> @@ -0,0 +1,156 @@
> +======================================
> +Design for adding ifdown script to KVM
> +======================================
> +
> +.. contents:: :depth: 4
> +
> +This is a design document about adding support for an ifdown script 
> responsible
> +for deconfiguring network devices and cleanup changes made by the ifup 
> script. The
> +first implementation will target KVM but it could be ported to Xen as well
> +especially when hotplug gets implemented.
> +
> +
> +Current state and shortcomings
> +==============================
> +
> +Currently, KVM before instance startup, instance migration and NIC hotplug, 
> it
> +creates a tap and invokes explicitly the kvm-ifup script with the relevant
> +environment (INTERFACE, MAC, IP, MODE, LINK, TAGS, and all the network info 
> if
> +any; NETWORK\_SUBNET, NETWORK\_TAGS, etc).
> +
> +For Xen we have the `vif-ganeti` script (associated with vif-script 
> hypervisor
> +parameter). The main difference is that Xen calls it by itself by passing it 
> as
> +an extra option in the configuration file.
> +
> +This ifup script can do several things; bridge a tap to a bridge, add ip 
> rules,
> +update a external DNS or DHCP server, enable proxy ARP or proxy NDP, issue
> +openvswitch commands, etc.  In general we can divide those actions in two
> +categories:
> +
> +1) Commands that change the state of the host
> +2) Commands that change the state of external components.
> +
> +Currently those changes do not get cleaned up or modified upon instance
> +shutdown, remove, migrate, or NIC hot-unplug. Thus we have stale entries in
> +hosts and most important might have stale/invalid configuration on external
> +components like routers that could affect connectivity.
> +
> +A workaround could be hooks but:
> +
> +1) During migrate hooks the environment is the one held in config data
> +and not in runtime files. The NIC configuration might have changed on
> +master but not on the running KVM process (unless hotplug is used).
> +Plus the NIC order in config data might not be the same one on the KVM
> +process.
> +
> +2) On instance modification, changes are not available on hooks. With
> +other words we do not know the configuration before and after modification.
> +
> +Since Ganeti is the orchestrator and is the one who explicitly configures
> +host devices (tap, vif) it should be the one responsible for cleanup/
> +deconfiguration. Especially on a SDN approach this kind of script might
> +be useful to cleanup flows in the cluster in order to ensure correct paths
> +without ping pongs between hosts or connectivity loss for the instance.
> +
> +
> +Proposed Changes
> +================
> +
> +We add an new script, kvm-ifdown that is explicitly invoked after:
> +
> +1) instance shutdown on primary node
> +2) successful instance migration on source node
> +3) failed instance migration on target node
> +4) successful NIC hot-remove on primary node
> +
> +If an administrator's custom ifdown script exists (e.g. `kvm-ifdown-custom`),
> +the `kvm-ifdown` script executes that script, as happens with `kvm-ifup`.
> +
> +Along with that change we should rename custom ifup script from
> +`kvm-vif-bridge` (which does not make any sense) to `kvm-ifup-custom`.
> +
> +In contrary to `kvm-ifup`, one cannot rely on `kvm-ifdown` script to be
> +called. A node might die just after a successful migration or after an
> +instance shutdown. In that case, all "undo" operations will not be invoked.
> +Thus, this script should work "on a best effort basis" and the network
> +should not rely on the script being called or being successful. Additionally
> +it should modify *only* the node local dynamic configs (routes, arp entries,
> +SDN, firewalls, etc.), whereas static ones (DNS, DHCP, etc.) should be 
> modified
> +via hooks.
> +
> +
> +Implementation Details
> +======================
> +
> +1) Where to get the NIC info?
> +
> +We cannot account on config data since it might have changed. So the only
> +place we keep our valid data is inside the runtime file. During instance
> +modifications (NIC hot-remove, hot-modify) we have the NIC object from
> +the RPC. We take its UUID and search for the corresponding entry in the
> +runtime file to get further info. After instance shutdown and migration
> +we just take all NICs from the runtime file and invoke the ifdown script
> +for each one
> +
> +2) Where to find the corresponding TAP?
> +
> +Currently TAP names are kept under
> +/var/run/ganeti/kvm-hypervisor/nics/<instance>/<nic\_index>.
> +This is not enough. As told above a NIC's index might change during 
> instance's
> +life. An example will make things clear:
> +
> +* The admin starts an instance with three NICs.
> +* The admin removes the second without hotplug.
> +* The admin removes the first with hotplug.
> +
> +The index that will arrive with the RPC will be 1 and if we read the relevant
> +NIC file we will get the tap of the NIC that has been removed on second
> +step but is still existing in the KVM process.
> +
> +So upon TAP creation we write another file with the same info but named
> +after the NIC's UUID. The one named after its index can be left
> +for compatibility (Ganeti does not use it; external tools might)
> +Obviously this info will not be available for old instances in the cluster.
> +The ifdown script should be aware of this corner case.
> +
> +3) What should we cleanup/deconfigure?
> +
> +Upon NIC hot-remove we obviously want to wipe everything. But on instance
> +migration we don't want to reset external configuration like DNS.  So we 
> choose
> +to pass an extra positional argument to the ifdown script (it already has the
> +TAP name) that will reflect the context it was invoked with. Please note that
> +de-configuration of external components is not encouraged and should be
> +done via hooks. Still we could easily support it via this extra argument.
> +
> +4) What will be the script environment?
> +
> +In general the same environment passed to ifup script. Except instance's
> +tags. Those are the only info not kept in runtime file and it can
> +change between ifup and ifdown script execution. The ifdown
> +script must be aware of it and should cleanup everything that ifup script
> +might setup depending on instance tags (e.g. firewalls, etc)
> +
> +
> +Configuration Changes
> +~~~~~~~~~~~~~~~~~~~~~
> +
> +1) The `kvm-ifdown` script will be an extra file installed under the same dir
> +   `kvm-ifup` resides. We could have a single script (and symbolic links to 
> it)
> +   that shares the same code, where a second positional argument or an extra
> +   environment variable would define if we are bringing the interface up or
> +   down. Still this is not the best practice since it is not equivalent
> +   with how KVM uses `script` and `downscript` in the `netdev` option; 
> scripts
> +   are different files that get the tap name as positional argument. Of 
> course
> +   common code will go in `net-common` so that it can be sourced from either
> +   Xen or KVM specific scripts.
> +
> +2) An extra file written upon TAP creation named after the NIC's UUID and
> +   including the TAP's name. Since this should be the correct file to keep
> +   backwards compatibility we create a symbolic link named after the NIC's
> +   index and pointing to this new file.
> +
> +.. vim: set textwidth=72 :
> +.. Local Variables:
> +.. mode: rst
> +.. fill-column: 72
> +.. End:
> -- 
> 1.7.10.4
> 

-- 
Jose Antonio Lopes
Ganeti Engineering
Google Germany GmbH
Dienerstr. 12, 80331, München

Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Graham Law, Christine Elizabeth Flores
Steuernummer: 48/725/00206
Umsatzsteueridentifikationsnummer: DE813741370

Reply via email to