Hi all! Without having read this design doc in depth, I think the doc/design-ifdown.rst introduces functionallity that macvtap support could find very useful.
Yesterday (without any idea that this design doc was to come :)) I implemented ifdown on 2.14 (actually ported it from 2.10). I think it is the perfect timing for submitting the patches upstream (to master I guess). Are you OK with that? Thanks, dimara * Dimitris Bliablias <[email protected]> [2015-04-22 14:59:49 +0300]: > This patch adds a design document detailing the implementation > providing support for the MacVTap device driver in Ganeti. > > Signed-off-by: Dimitris Bliablias <[email protected]> > --- > > Hello, > > This design document describes the implementation for providing support > for the MacVTap device driver in Ganeti. An interface that could greatly > simplify Ganeti setups using bridged instances. > > Looking forward for your feedback, > Dimitris > > Makefile.am | 1 + > doc/design-draft.rst | 1 + > doc/design-macvtap.rst | 262 > +++++++++++++++++++++++++++++++++++++++++++++++++ > 3 files changed, 264 insertions(+) > create mode 100644 doc/design-macvtap.rst > > diff --git a/Makefile.am b/Makefile.am > index 5068050..b65706d 100644 > --- a/Makefile.am > +++ b/Makefile.am > @@ -685,6 +685,7 @@ docinput = \ > doc/design-location.rst \ > doc/design-linuxha.rst \ > doc/design-lu-generated-jobs.rst \ > + doc/design-macvtap.rst \ > doc/design-monitoring-agent.rst \ > doc/design-move-instance-improvements.rst \ > doc/design-multi-reloc.rst \ > diff --git a/doc/design-draft.rst b/doc/design-draft.rst > index c589b56..ac107ad 100644 > --- a/doc/design-draft.rst > +++ b/doc/design-draft.rst > @@ -28,6 +28,7 @@ Design document drafts > design-dedicated-allocation.rst > design-allocation-efficiency.rst > design-shared-storage-redundancy.rst > + design-macvtap.rst > > .. vim: set textwidth=72 : > .. Local Variables: > diff --git a/doc/design-macvtap.rst b/doc/design-macvtap.rst > new file mode 100644 > index 0000000..e6d9239 > --- /dev/null > +++ b/doc/design-macvtap.rst > @@ -0,0 +1,262 @@ > +=============== > +MacVTap support > +=============== > + > +.. contents:: :depth: 3 > + > +This is a design document detailing the implementation providing > +support for the `MacVTap` device driver in Ganeti. The initial > +implementation will target the KVM hypervisor, but it is intended to be > +ported to the XEN hypervisor as well. > + > +Current state and shortcomings > +============================== > + > +Currently, Ganeti provides a number of options for networking a virtual > +machine, i.e., ``bridged``, ``routed``, and ``openvswitch`` modes. > +``MacVTap``, is another virtual network interface in Linux, that is not > +supported by Ganeti and we could add it to the currently supported > +solutions. It is an interface that acts as a regular TUN/TAP device, > +and thus it is transparently supported by QEMU. Because of its > +operation, it can greatly simplify Ganeti setups using bridged > +instances. > + > +In brief, it is an interface based on the ``macvlan`` Linux driver, > +meant to replace the combination of the TUN/TAP and bridge drivers with > +a simplified setup that doesn't need to do learning or STP as it knows > +every MAC address it can receive. In fact, it introduces a bridge-like > +behavior of virtual machines but without the need to have a real bridge > +setup on the host. Instead, each virtual interface extends an existing > +network device by attaching directly to it, and has its own MAC address > +providing a separate virtual interface to be used by the userspace > +processes. The MacVTap MAC address is used on the external network and > +the guest OS cannot spoof or change that address. > + > +Background > +========== > + > +This section gives some extra information on the MacVTap interface, that > +we took into account for the rest of this design document. > + > +MacVTap modes of operation > +-------------------------- > + > +A MacVTap device can operate in one of four modes, like the macvlan > +driver does, that are defined at creation time and determine how the > +tap endpoints communicate between each other. Those are the following: > + > +* `VEPA (Virtual Ethernet Port Aggregator) mode`: The default mode that > + is compatible with virtualization-enabled switches. The communication > + between endpoints on the same lower device, happens through the > + external switch. > + > +* `Bridge mode`: It works almost like a traditional bridge, connecting > + all endpoints directly to each other. > + > +* `Private mode`: An endpoint in this mode can never communicate to any > + other endpoint on the same lower device. > + > +* `Passthru mode`: This mode was added later to work on some > + limitations on macvlans (more details here_). > + > +MacVTap internals > +----------------- > + > +The creation of a MacVTap device is not done by opening the > +`/dev/net/tun` device and issuing a corresponding `ioctl()` to register > +a network device as happens in tap devices. Instead, there are two ways > +to create a MacVTap device. The first one is using the `rtnetlink(7)` > +interface directly, just like the `libvirt` or the `iproute2` utilities > +do, and the second one is to use the high-level `ip-link` command. Since > +creating programmatically a MacVTap interface using the netlink protocol > +is a bit more complicated than creating a normal TUN/TAP device, we > +propose using the ip-link tool for the MacVTap handling, which it is > +more simple and straightforward in use, and also fulfills all our needs. > +Thus, since Ganeti already depends on `iproute2` being installed in the > +system, this does not introduces an extra dependency. > + > +The following example, creates a MacVTap device using the `ip-link` > +tool, named `macvtap0`, operating in `bridge` mode, and which is using > +`eth0` as its lower device: > + > +:: > + > + ip link add link eth0 name macvtap0 address 1a:36:1b:aa:b3:77 type macvtap > mode bridge > + > +Once a MacVTap interface is created, an actual character device appears > +under `/dev`, called ``/dev/tapXX``, where ``XX`` is the interface index > +of the device. > + > +Proposed changes > +================ > + > +In order to be able to create instances using the MacVTap device driver, > +we propose some modifications that affect the ``nicparams`` slot of the > +Ganeti's configuration ``NIC`` object, and also the code part regarding > +to the KVM hypervisor, as detailed in the following sections. > + > +Configuration changes > +--------------------- > + > +The nicparams ``mode`` attribute will be extended to support the > +``macvtap`` mode. When using the MacVTap mode, the ``link`` attribute > +will specify the network device where the MacVTap interfaces will be > +attached to (the lower device). Note that the lower device should > +exist, otherwise the operation will fail. If no link is specified, the > +cluster-wide default NIC `link` param will be used instead. > + > +We propose the MacVTap mode to be configurable, and so the nicparams > +object will be extended with an extra slot named ``mvtap_mode``. This > +parameter will only be used if the network mode is set to MacVTap since > +it does not make sense in other modes, similarly to the `vlan` slot of > +the `openvswitch` mode. > + > +Below there is a snippet of some of the ``gnt-network`` commands' > +output: > + > +Network connection > +~~~~~~~~~~~~~~~~~~ > + > +:: > + > + gnt-network connect -N mode=macvtap,link=eth0,mvtap_mode=bridge vtap-net > vtap_group > + > +Network listing > +~~~~~~~~~~~~~~~ > + > +:: > + > + gnt-network list > + > + Network Subnet Gateway MacPrefix GroupList > + br-net 10.48.1.0/2 4 10.48.1.254 - default (bridged, br0, , > ) > + vtap-net 192.168.100.0/24 192.168.100.1 - vtap_group (macvtap, > eth0, , bridge) > + > +Network information > +~~~~~~~~~~~~~~~~~~~ > + > +:: > + > + gnt-network info > + > + Network name: vtap-net > + UUID: 4f139b48-3f08-46b1-911f-d37de7e12dcf > + Serial number: 1 > + Subnet: 192.168.100.0/28 > + Gateway: 192.168.100.1 > + IPv6 Subnet: 2001:db8:2ffc::/64 > + IPv6 Gateway: 2001:db8:2ffc::1 > + Mac Prefix: None > + size: 16 > + free: 10 (62.50%) > + usage map: > + 0 XXXXX..........X 63 > + (X) used (.) free > + externally reserved IPs: > + 192.168.100.0, 192.168.100.1, 192.168.100.15 > + connected to node groups: > + vtap_group (mode:macvtap link:eth0 vlan: mvtap_mode:bridge) > + used by 2 instances: > + inst1.example.com: 0:192.168.100.2 > + inst2.example.com: 0:192.168.100.3 > + > + > +Hypervisor changes > +------------------ > + > +A new method will be introduced in the KVM's `netdev.py` module, named > +``OpenVTap``, similar to the ``OpenTap`` method, that will be > +responsible for creating a MacVTap device using the `ip-link` command, > +and returning its file descriptor. The ``OpenVtap`` method will receive > +as arguments the network's `link`, the mode of the MacVTap device > +(``mvtap_mode``), and also the ``interface name`` of the device to be > +created, otherwise we will not be able to retrieve it, and so opening > +the created device. > + > +Since we want the names among the MacVTap devices to be unique on the > +same node, we will make use of the existing ``_GenerateKvmTapName`` > +method to generate device names but with some modifications, to adapt it > +to our needs. This method is actually a wrapper over the > +``GenerateTapName`` method which currently is used to generate TAP > +interface names for NICs meant to be used in instance communication > +using the `gnt.com` prefix. We propose extending this method to generate > +names for the MacVTap interface too, using the `vtap` prefix. To do so, > +we could add an extra boolean argument in this method, named > +`instance_comm`, to differentiate the two cases so the method returns > +the appropriate name depending on its usage. This argument will be > +optional and defaulted to `True`, to not affect the existing API. > + > +Currently, the `OpenTap` method handles the `vhost-net`, `mq`, and the > +`vnet_hdr` features. The `vhost-net` feature will be normally supported > +for the MacVTap devices too, and so is the `multiqueue` feature, that > +can be enabled using the `numrxqueues` and `numtxqueues` parameters of > +the `ip-link` command. The only drawback seems to be the `vnet_hdr` > +feature modification. For a MacVTap device this flag is enabled by > +default, and it can not be disabled if the user request to. > + > +A final hypervisor change will be the introduction of a new method named > +``_RemoveStaleMacvtapDevices`` that will remove any remaining MacVTap > +devices, and which is detailed in the following section. > + > +Tools changes > +------------- > + > +Some of the Ganeti tools should also be extended to support MacVTap > +devices. Those are the ``kvm-ifup`` and ``net-common`` scripts. Those > +modifications will include a new method named ``setup_macvtap`` that > +will simply change the device status to `UP` just before we start an > +instance: > + > +:: > + > + ip link set $INTERFACE up > + > +As mentioned in the `Background` section, MacVTap devices are > +persistent. So, we have to manually delete the MacVTap device after an > +instance shutdown. To do so, we propose creating a ``kvm-ifdown`` > +script, that will be invoked after an instance shutdown in order to > +remove the remaining MacVTap devices. The ``kvm-ifdown`` should > +explicitly call the following commands and will be functional for > +MacVTap NICs only: > + > +:: > + > + ip link set $INTERFACE down > + ip link delete $INTERFACE > + > +To be able to call the `kvm-ifdown` script we should extend the KVM's > +``_ConfigureNIC`` method with an extra argument that is the name of the > +script to be invoked, instead of calling by default the `kvm-ifup` > +script, as it currently happens. > + > +The invocation of the `kvm-ifdown` script will be made through a > +separate method we will create, named ``_RemoveStaleMacvtapDevices``. > +This method will read the NIC runtime files of the instance and will > +remove any devices using the MacVTap interface. This method will be > +included in the ``CleanupInstance`` method in order to cover all the > +cases where an instance using MacVTap NICs needs to be cleaned up. > + > +Besides the instance shutdown, there are a couple of cases where the > +MacVTap NICs will need to be cleaned up too. In case of an internal > +instance shutdown, where the ``kvmd`` is not enabled, the instance will > +be in ``ERROR_DOWN`` state. In that case, when the instance is started > +either by the `ganeti-watcher` or by the admin, the ``CleanupInstance`` > +method, and consequently the `kvm-ifdown` script, will not be called > +and so the MacVTap NICs will have to manually be deleted. Otherwise > +starting the instance will result in more than one MacVTap devices using > +the same MAC address. An instance migration is another case where > +deleting an instance will keep stale MacVTap devices on the source node. > +In order to solve those potential issues, we will explicitly call the > +``_RemoveStaleMacvtapDevices`` method after a successful instance > +migration on the source node, and also before creating a new device for > +a NIC which is using the macvtap interface to remove any remaining > +MacVTap devices. > + > + > +.. _here: http://thread.gmane.org/gmane.comp.emulators.kvm.devel/61824/) > + > +.. vim: set textwidth=72 : > +.. Local Variables: > +.. mode: rst > +.. fill-column: 72 > +.. End: > -- > 2.1.4
signature.asc
Description: Digital signature
