On Mon, Jan 13, 2014 at 10:19:11am +0100, Jose A. Lopes wrote:
> [snip]

Hello Jose,

Thanks for your detailed answer!
Comments follow inline.

> > > +On the host side, these TAP network interfaces will have IP address
> > > +``169.254.169.254`` in the network ``169.254.0.0/16`` (i.e., netmask
> > > +``255.255.0.0``).  On the guest side, each instance will have its own MAC

This was my initial concern: If I setup an interface (say tap0) to have
IP 169.254.169.254 with netmask 255.255.0.0, doesn't this imply that
there is going to be a new entry in the routing table to route network
169.254.0.0/16 via tap0? How can this work if I assign the *same* IP,
with the same non-/32 netmask to multiple interfaces?

> > > +address and an IP address in the network ``169.254.0.0/16``.  The MAC 
> > > address
> > > +and the IP address must be unique within a single host. The guest will 
> > > use the
> > 
> > It's not very clear to me, who will be responsible for setting up
> > the host side of the TAP interfaces. Who will be responsible for
> > assigning the IP address 169.254.169.254 on all TAP intefaces of the
> > host, and what will the routing rules be? To clarify, say I have 3 VMs,
> > on tap0, tap1 and tap2, with IPs 169.254.0.1, 169.254.0.2, 169.254.0.3
> > respectively.
> 
> Ganeti configures the interfaces.  It seems a good idea for Ganeti to
> create the TAP interfaces and pass them as filedescriptor to KVM, as
> you suggested.  Ganeti will then configure the IP address on the
> interface.  I will update the design doc with this.
> 
> > If the host has IP 169.254.169.254 on all interfaces, with the same /16
> > netmask, how will it be able to pick the right interface when sending an
> > IP packet to VM1 vs. when sending to VM3?
> > 
> > I think this could work with explicit routes: One to 169.254.0.1/32
> > through tap0, one to 169.254.0.2/32 through tap1, and one through
> > 169.254.0.2/32 through tap2. If yes, will Ganeti set up these routes
> > explicitly?
> 
> Isn't this already solved with the 'route add -host ...'?
> 

Yes, I think this would be OK. For example, in the case of VM1 being on
tap0 and having IP 169.254.0.1, this would add a new routing entry with
a /32 netmask, e.g.:

# ip ro ls
...
169.254.0.1 dev tap0  proto static  scope link

So, also in comparison to the discussion about netmask above, what sense would
it make to have IP 169.254.169.254 with an IP of /16 on this interface? Would
it be better to say something along the lines of "The host will have IP
169.254.169.254 with a netmask of /32 on all interfaces, and explicit routes,
for each VM being behind each interface"? (e.g., 169.254.0.1/32 dev tap0,
169.254.0.2/32 dev tap1, and so on).

# ip addr add 169.254.169.254 dev tap0
# ip addr list
...
    inet 169.254.169.254/32 scope global tap0

Otherwise, trying to specify a /16 netmask leads to the creation of an
(unwanted?) routing entry for the whole /16 network, and multiple
routing entries, which don't really contribute anything:

169.254.0.0/16 dev tap0  proto kernel  scope link  src 169.254.169.254
169.254.0.0/16 dev tap1  proto kernel  scope link  src 169.254.169.254

> > In a similar note, who will be responsible for setting up the DHCP
> > server? It could be the administrator's responsibility, but then if it
> > is Ganeti the entity which picks the MAC addresses and IPs for the guest
> > side of the TAP interfaces, how will this DHCP server be notified, so as
> > to only server the correct IP addresses to specific MAC addresses?
> 
> Ganeti configures the DHCP server, starts it and stops it.  Ganeti
> also reconfigures the DHCP server when a new VM is started/stopped.
> The DHCP server listens only on the TAP interfaces for the VMs so it
> shouldn't interfere with other DHCP servers running on the host.  I
> will make it more clear in the design doc.
> 
> Currently, I have only experimented with 'dnsmasq'.  This DHCP server
> allows all of the above.  The only thing that could be improved is the
> fact that it is not possible to dynamically extend the interfaces
> 'dnsmasq' is listening to.  Therefore, it is necessary to update the
> configuration file and restart 'dnsmasq'.
> 

Have you been able to give specific (tap, MAC, IP) tuples to dnsmasq,
somehow binding a MAC address on a specific TAP interface?
In other words, how do you instruct dnsmasq to only honor a DHCP request
from a specific MAC address, if it only comes from a specific TAP?

I'm looking at the dnsmasq manpage for the "--dhcp-host" argument:

-G, 
--dhcp-host=[<hwaddr>][,id:<client_id>|*][,set:<tag>][,<ipaddr>][,<hostname>][,<lease_time>][,ignore]

and can't seem to find an obvious way to do it.

Can you share more information on your experimental setup?
Is every TAP interface independent, do you have them all on a bridge?

I'll come back to the issue of updating dnsmasq configurations and handling
multiple TAP interfaces concurrently in a reply to your other mails about
nfdhcpd.

Thanks,
Vangelis.

> > Also, if it is the administrator's responsibility, then perhaps the
> > admin should be able to set up standard ifup hooks, like for every
> > other interface of an instance. But in the following examples, you
> > specifically set script=no,downscript=no.
> > 
> > Another possibility would be for Ganeti to come prepackaged with its
> > own, embedded DHCP server just for serving requests on the TAPs used for
> > the communication mechanism. We've been using snf-nfdhcpd
> > (https://code.grnet.gr/projects/snf-nfdhcpd) for just that in
> > production.
> >
> > Actually, in previous conversation Guido had asked us to document how to
> > set it up with Ganeti, and merge the resulting docs with the Ganeti
> > upstream. Perhaps it would make sense to combine the effort now, and use
> > snf-nfdhcpd as an embedded DHCP server with Ganeti. Sorry for not having
> > documented it earlier.
> 
> I'm going to have a look at this and ask Guido about it.
> 
> > > +DHCP protocol on its last network interface to contact a DHCP server 
> > > running on
> > > +the host and thus determine its IP address.  The DHCP server will be 
> > > listening
> > > +exclusively on the TAP network interfaces of the guests.  Therefore, it 
> > > will not
> > > +interfere with a potential DHCP server running on the same host.  
> > > Furthermore,
> > > +the DHCP server will only recognize MAC and IP address pairs that have 
> > > been
> > > +approved by Ganeti.
> > > +
> > > +The TAP network interfaces created for each guest all share the same IP 
> > > address.
> > > +Therefore, it will be necessary to extend the routing table with rules 
> > > specific
> > > +to each guest.  This can be achieved with the following command, which 
> > > takes the
> > > +guest's unique IP address and its TAP interface::
> > > +
> > > +  route add -host <ip> dev <ifname>
> > > +
> > > +For KVM, an instance will be started with a unique MAC address and the 
> > > TAP
> > > +network interface name meant to be used by the communication mechanism.  
> > > KVM
> > > +creates the actual interface::
> > > +
> > > +  kvm -net nic,macaddr=<mac> -net 
> > > tap,ifname=<ifname>,script=no,downscript=no ...
> > > +
> > 
> > If I understand correctly, in previous versions of Ganeti it used to be
> > the case that KVM opened the actual TAP interface, upon initialization
> > of the KVM process. This was changed however (see commit 5d9bfd870a) so
> > that Ganeti itself created the TAP interface, then passed it as an open
> > file descriptor to the KVM process. Is there any reason to deviate from
> > this, and make handling the TAP interface for the communication
> > mechanism a special case?
> > 
> > Also, the same question applies as above. If setting up the DHCP server 
> > is the responsibility of the administrator, then perhaps Ganeti should
> > support running ifup hooks for the TAPs. Or, Ganeti could come with its
> > own embedded DHCP server and handle everything by itself, without
> > messing with an already existing DHCP server.
> > 
> > Thanks,
> > Vangelis.
> > 
> > > +For Xen, a network interface will be created on the host (using the 
> > > ``vif``
> > > +parameter of the Xen configuration file).  Each instance will have its
> > > +corresponding ``vif`` network interface on the host.  The ``vif-route`` 
> > > script
> > > +of Xen might be helpful in implementing this.
> > > +
> > > +
> > > +Metadata service
> > > +++++++++++++++++
> > > +
> > > +An instance will be able to reach metadata service on 
> > > ``169.254.169.254:80`` in
> > > +order to, for example, retrieve its metadata.  This IP address and port 
> > > were
> > > +chosen for compatibility with the OpenStack and Amazon EC2 metadata 
> > > service.
> > > +The metadata service will be provided by a single daemon, which will 
> > > determine
> > > +the source instance for a given request and reply with the metadata 
> > > pertaining
> > > +to that instance.
> > >  
> > >  Where possible, the metadata will be provided in a way compatible with 
> > > Amazon
> > >  EC2, at::
> > >  
> > >    http://169.254.169.254/<version>/meta-data/*
> > >  
> > > -If some metadata are Ganeti-specific and don't fit this structure, they 
> > > will be
> > > -provided at::
> > > +Ganeti-specific metadata, that does not fit this structure, will be 
> > > provided
> > > +at::
> > >  
> > >    http://169.254.169.254/ganeti/<version>/meta_data.json
> > >  
> > > -``<version>`` is either a date in YYYY-MM-DD format, or ``latest`` to 
> > > indicate
> > > -the most recent available protocol version.
> > > +where ``<version>`` is either a date in YYYY-MM-DD format, or ``latest`` 
> > > to
> > > +indicate the most recent available protocol version.
> > >  
> > >  If needed in the future, this structure also allows us to support 
> > > OpenStack's
> > >  metadata at::
> > >  
> > >    http://169.254.169.254/openstack/<version>/meta_data.json
> > >  
> > > -A bi-directional, pipe-like communication channel will be provided. The 
> > > instance
> > > -will be able to receive data from the host by a GET request at::
> > > +A bi-directional, pipe-like communication channel will also be provided. 
> > >  The
> > > +instance will be able to receive data from the host by a GET request at::
> > >  
> > >    http://169.254.169.254/ganeti/<version>/read
> > >  
> > > @@ -331,12 +341,10 @@ and to send data to the host by a POST request at::
> > >    http://169.254.169.254/ganeti/<version>/write
> > >  
> > >  As in a pipe, once the data are read, they will not be in the buffer 
> > > anymore, so
> > > -subsequent GET requests to ``read`` will not return the same data twice.
> > > -Unlike a pipe, though, it will not be possible to perform blocking I/O
> > > -operations.
> > > +subsequent GET requests to ``read`` will not return the same data.  
> > > However,
> > > +unlike a pipe, it will not be possible to perform blocking I/O 
> > > operations.
> > >  
> > > -The OS parameters will be accessible through a GET
> > > -request at::
> > > +The OS parameters will be accessible through a GET request at::
> > >  
> > >    http://169.254.169.254/ganeti/<version>/os/parameters.json
> > >  
> > > @@ -424,8 +432,61 @@ the total time allowed to setup an instance inside 
> > > the appliance. It is mainly
> > >  meant as a safety measure to prevent an instance taken over by malicious 
> > > scripts
> > >  to be available for a long time.
> > >  
> > > -.. vim: set textwidth=72 :
> > > -.. Local Variables:
> > > -.. mode: rst
> > > -.. fill-column: 72
> > > -.. End:
> > > +
> > > +Port forwarding in KVM
> > > +++++++++++++++++++++++
> > > +
> > > +The communication mechanism could have been implemented in KVM using 
> > > guest port
> > > +forwarding, as opposed to network interfaces.  There are two 
> > > alternatives in
> > > +KVM's guest port forwarding, namely, creating a forwarding device, such 
> > > as, a
> > > +TCP/IP connection, or executing a command.  However, we have determined 
> > > that
> > > +both of these options are not viable.
> > > +
> > > +A TCP/IP forwarding device can be created through the following KVM 
> > > invocation::
> > > +
> > > +  kvm -net nic -net \
> > > +    user,restrict=on,net=169.254.169.0/24,host=169.254.169.253,
> > > +    guestfwd=tcp:169.254.169.254:80-tcp:127.0.0.1:8080 ...
> > > +
> > > +This invocation even has advantage that it can remap ports, which would 
> > > have
> > > +allowed the metadata service daemon to run in port 8080 instead of 80.  
> > > However,
> > > +in this scheme, KVM opens the TCP connection only once, when it is 
> > > started, and,
> > > +if the connection breaks, KVM will not reconnect.  Furthermore, this also
> > > +interferes with the HTTP protocol, which needs to dynamically establish 
> > > and
> > > +close connections.
> > > +
> > > +The alternative to opening a single TCP/IP connection is to execute a 
> > > command.
> > > +The KVM invocation for this is, for example, the following::
> > > +
> > > +  kvm -net nic -net \
> > > +    "user,restrict=on,net=169.254.169.0/24,host=169.254.169.253,
> > > +    guestfwd=tcp:169.254.169.254:80-netcat 127.0.0.1 8080" ...
> > > +
> > > +The advantage of this approach is that the command is executed each time 
> > > the
> > > +guest initiates a connection.  This is the ideal situation, however, it 
> > > is only
> > > +supported in KVM 1.2 and above, and, therefore, not viable because we 
> > > want to
> > > +provide support for at least KVM version 1.0, which is the version 
> > > provided by
> > > +Ubuntu LTS.
> > > +
> > > +
> > > +Alternatives to the DHCP server
> > > ++++++++++++++++++++++++++++++++
> > > +
> > > +There are alternatives to using the DHCP server, for example, by 
> > > assigning
> > > +identical IP addresses to guests, such as, the IP address 
> > > ``169.254.169.253``.
> > > +However, this introduces a routing problem, namely, how to route incoming
> > > +packets from the same source IP to the host.  This problem can be 
> > > overcome in a
> > > +number of ways.
> > > +
> > > +The first solution is to use NAT to translate the incoming guest IP 
> > > address, for
> > > +example, ``169.254.169.253``, to an IP address unique within a single 
> > > host, for
> > > +example, ``169.254.0.1``.  Given that NAT through ``ip rule`` is 
> > > deprecated,
> > > +users can resort to ``iptables``.  Note that this has not yet been 
> > > tested.
> > > +
> > > +Another option, which has indeed been tested in a prototype, is to 
> > > connect the
> > > +TAP network interfaces of the guests to a bridge.  The bridge takes the
> > > +configuration for the TAP network interfaces, namely, IP address
> > > +``169.254.169.254`` and netmask ``255.255.0.0``, thus leaving those 
> > > interfaces
> > > +without an IP address.  Note that in this setting, guests will be able 
> > > to reach
> > > +each other, therefore, if necessary, additional ``iptables`` rules can 
> > > be put in
> > > +place to prevent it.
> > > -- 
> > > 1.8.5.1
> > 

-- 
Vangelis Koukis
vkou...@grnet.gr
OpenPGP public key ID:
pub  1024D/1D038E97 2003-07-13 Vangelis Koukis <vkou...@cslab.ece.ntua.gr>
     Key fingerprint = C5CD E02E 2C78 7C10 8A00  53D8 FBFC 3799 1D03 8E97

Only those who will risk going too far
can possibly find out how far one can go.
        -- T.S. Eliot

Attachment: signature.asc
Description: Digital signature

Reply via email to