Hi all,

after looking through the mesos source code for a while, here are
some of my initial thoughts.

There seem to be at least two issues that can be tackled separately:
 - Communication between mesos daemons over the network
 - Communication in and out of containers when using network isolation

Having the first one would already be valuable for installations that
don't use network isolation, so I'll focus on this for now.

If a mesos master daemon runs on say mesos-master.example.org:5050, and
this host has both A and AAAA addresses configured it seems to be
desirable that slaves can communicate with this node over both IPv4 and
IPv6, depending on their own capabilities.

>From the client perspective, the problem is solved by the "Happy
Eyeballs" algorithm, i.e. trying both possibilities and using
the one where it is possible to connect. The only complication is that
address resolution should probably be delayed until we actually want to
connect, to avoid spurious failures.

On the server side it is a bit more subtle, since the server has to
decide which address it should bind its listening socket to. Some
possibilities would be:

 1) Do nothing special, just bind to the address that was specified
 2) Allow specifying multiple listen IP's
 3) Allow to specify a network interface and port and open two separate
listening sockets for IPv4 and IPv6

These are not mutually exclusive.

It seems that (2) and (3) would be desirable anyways, since they would
also enable running on hosts with multiple network interfaces.

It is however worth noting that (1) already gets us quite far without
changing the assumption that there is a single IP associated to a mesos
daemon: If an IPv4 address is specified, things will work the same as
before, and if there is an IPv6 address specified it will by default
accept connections from both IPv4 and IPv6 sources. This behaviour can
even be changed at system-level, if not desired. (via
/proc/sys/net/ipv6/bindv6only, or the mac/windows equivalent).

So, tl;dr: I believe a lot of of useful progress could already be
achieved by a relatively small patch series, that:

 - Fills in the blanks in stout's net:IP, and gives all functions
   which  take an explicit "family"-argument a default value of
   AF_UNSPEC
 - Updates the IPv4-specific parts in libprocess (in particular, the
   parsing of IP literals in URL strings and the constructor
   Socket::create(Kind, Option<int> fd), which should probably be split
   into e.g. Socket::make(sa_family_t, Kind) and
   Socket::wrap(int fd, Kind))
 - Changes all calling sites to use the new functions

The protobuf IPC format doesn't seem to require any changes, since the
only IPv4-dependent field (MasterInfo.ip) was already deprecated in 0.24.0.

After this, the next step would then be to look at network isolation and
enabling communication in and out of containers.

Thoughts? Comments? Am I missing something?

Best regards,
Benno

Reply via email to