[
https://issues.apache.org/jira/browse/MESOS-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15292465#comment-15292465
]
Dan Osborne commented on MESOS-5325:
------------------------------------
[~aavinash] I've had a think about this, and I don't believe your initial point
is true. NetworkInfo is a repeated field in the ContainerInfo. By default, it
will be an empty list. If it has at least one initialized NetworkInfo in it,
this means the User/Framework has made a "Network Request", and therefore, the
task should be reachable.
https://github.com/apache/mesos/blob/0.27.0/include/mesos/mesos.proto#L1542
Therefore, we should not be extracting IP's and filling them in the
NetworkInfo's IPAddress field if NetworkInfo was not initialized.
Marathon, for example, already exhibits this behavior. If the ipAddress field
is specified in Marathon, it flags that this is an ip-per-container task, so it
will add a NetworkInfo to the repeated network_infos to request an IP Address.
If the field is not specified, then Marathon leaves the network_infos as an
empty list. In this case, its up to the user to not launch a Docker Bridge task
using Marathon while also specifiying the ipAddress field.
> Mesos can't determine if task IP is reachable
> ---------------------------------------------
>
> Key: MESOS-5325
> URL: https://issues.apache.org/jira/browse/MESOS-5325
> Project: Mesos
> Issue Type: Bug
> Reporter: Dan Osborne
>
> I have uncovered a design flaw that affects ip-per-container tasks when run
> in a cluster alongside non ip-per-container tasks. This affects
> docker-libnetwork, netmodules, and I suspect it will also affect CNI.
> After Mesos launches a docker bridge task, it fills the task's networkinfo
> field with the docker bridge IP assigned to that task. Because of this
> behavior, when a launched task's NetworkInfo is later utilized by Mesos
> components, it is unknown if it is filled with an IP address accessible
> throughout the cluster, or if it is not.
> A common use case where this is a problem can be encountered when using Mesos
> DNS. Mesos-DNS has a configuration setting that tells it which information to
> respond to a query with: NetworkInfo, or HostIP. If it has been configured to
> prefer NetworkInfo, it correctly resolves ip-per-container containers to
> their unique IP. But, because the docker bridge IP is also stored in
> NetworkInfo, it will incorrectly resolve docker-bridge containers to an IP
> address not accessible from anywhere besides the slave they are on. This
> breaks DNS resolutions in Mesos.
> I believe Mesos needs a way to distinguish between tasks which are accessible
> via their IP and tasks that are not.
> One fix would be to prevent Mesos from filling in NetworkInfo for a task if
> it is known that the task is not reachable throughout the cluster via that
> address. Essentially, NetworkInfo could be interpreted as a boolean - Its
> presence means this task is addressable. Its absence means the task is not.
> In practice, this would mean it gets filled in for CNI tasks, netmodules
> tasks, and docker tasks bound to the host networking namespace. It would not
> get filled in for docker bridge tasks.
> I believe this change would be fairly minimum in scope. To implement it,
> Mesos would need to be changed to not store Docker Bridge IP's in NetworkInfo.
> I'm also open to discussion and other suggestions on how to resolve this.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)