Dan Osborne created MESOS-5325:
----------------------------------
Summary: Mesos can't determine if task IP is reachable
Key: MESOS-5325
URL: https://issues.apache.org/jira/browse/MESOS-5325
Project: Mesos
Issue Type: Bug
Reporter: Dan Osborne
I have uncovered a design flaw that affects ip-per-container tasks when run in
a cluster alongside non ip-per-container tasks. This affects docker-libnetwork,
netmodules, and I suspect it will also affect CNI.
After Mesos launches a docker bridge task, it fills the task's networkinfo
field with the docker bridge IP assigned to that task. Because of this
behavior, when a launched task's NetworkInfo is later utilized by Mesos
components, it is unknown if it is filled with an IP address accessible
throughout the cluster, or if it is not.
A common use case where this is a problem can be encountered when using Mesos
DNS. Mesos-DNS has a configuration setting that tells it which information to
respond to a query with: NetworkInfo, or HostIP. If it has been configured to
prefer NetworkInfo, it correctly resolves ip-per-container containers to their
unique IP. But, because the docker bridge IP is also stored in NetworkInfo, it
will incorrectly resolve docker-bridge containers to an IP address not
accessible from anywhere besides the slave they are on. This breaks DNS
resolutions in Mesos.
I believe Mesos needs a way to distinguish between tasks which are accessible
via their IP and tasks that are not.
One fix would be to prevent Mesos from filling in NetworkInfo for a task if it
is known that the task is not reachable throughout the cluster via that
address. Essentially, NetworkInfo could be interpreted as a boolean - Its
presence means this task is addressable. Its absence means the task is not. In
practice, this would mean it gets filled in for CNI tasks, netmodules tasks,
and docker tasks bound to the host networking namespace. It would not get
filled in for docker bridge tasks.
I believe this change would be fairly minimum in scope. To implement it, Mesos
would need to be changed to not store Docker Bridge IP's in NetworkInfo.
I'm also open to discussion and other suggestions on how to resolve this.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)