Dan Osborne created MESOS-5325:
----------------------------------

             Summary: Mesos can't determine if task IP is reachable
                 Key: MESOS-5325
                 URL: https://issues.apache.org/jira/browse/MESOS-5325
             Project: Mesos
          Issue Type: Bug
            Reporter: Dan Osborne


I have uncovered a design flaw that affects ip-per-container tasks when run in 
a cluster alongside non ip-per-container tasks. This affects docker-libnetwork, 
netmodules, and I suspect it will also affect CNI.

After Mesos launches a docker bridge task, it fills the task's networkinfo 
field with the docker bridge IP assigned to that task. Because of this 
behavior, when a launched task's NetworkInfo is later utilized by Mesos 
components, it is unknown if it is filled with an IP address accessible 
throughout the cluster, or if it is not.

A common use case where this is a problem can be encountered when using Mesos 
DNS. Mesos-DNS has a configuration setting that tells it which information to 
respond to a query with: NetworkInfo, or HostIP. If it has been configured to 
prefer NetworkInfo, it correctly resolves ip-per-container containers to their 
unique IP. But, because the docker bridge IP is also stored in NetworkInfo, it 
will incorrectly resolve docker-bridge containers to an IP address not 
accessible from anywhere besides the slave they are on. This breaks DNS 
resolutions in Mesos.

I believe Mesos needs a way to distinguish between tasks which are accessible 
via their IP and tasks that are not.

One fix would be to prevent Mesos from filling in NetworkInfo for a task if it 
is known that the task is not reachable throughout the cluster via that 
address. Essentially, NetworkInfo could be interpreted as a boolean - Its 
presence means this task is addressable. Its absence means the task is not. In 
practice, this would mean it gets filled in for CNI tasks, netmodules tasks, 
and docker tasks bound to the host networking namespace. It would not get 
filled in for docker bridge tasks.

I believe this change would be fairly minimum in scope. To implement it,  Mesos 
would need to be changed to not store Docker Bridge IP's in NetworkInfo.

I'm also open to discussion and other suggestions on how to resolve this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to