> libprocess should always bind to 0.0.0.0
+ 1 for this

On Wed, Oct 12, 2016 at 2:33 AM, Jie Yu <yujie....@gmail.com> wrote:

> Hi folks,
>
> I was in the process of cleaning up some tech debt related to env variables
> in our code base. I created an epic ticket
> <https://issues.apache.org/jira/browse/MESOS-6341> to track. I searched
> relevant tickets fired previously, and found MESOS-3740
> <https://issues.apache.org/jira/browse/MESOS-3740>. I did some digging on
> how we handle LIBPROCESS_IP currently, and here are my findings:
>
> 1) We always set LIBPROCESS_IP in the executor environment variables:
> https://github.com/apache/mesos/blob/master/src/slave/
> slave.cpp#L6793-L6796
>
> This is not an issue for an executor that runs on host network. However, if
> the executor wants to run on non-host network (e.g., overlay), this might
> be problematic, because libprocess for the executor will try to bind to
> LIBPROCESS_IP, but the IP is not valid inside the container.
>
> 2) As mentioned in MESOS-3740
> <https://issues.apache.org/jira/browse/MESOS-3740>, some user wants to run
> a Mesos framework in a Mesos container. The old style framework driver
> assumes a 2 way communication channel between the framework and the Mesos
> master. In order for the master to reach the framework running inside a
> Mesos container, the framework's libprocess should advertise its ip and
> port properly. This problem gets tricky because the networking for the
> Mesos container:
>
> 2.a) If the container uses host network, libprocess should bind to 0.0.0.0,
> and advertise itself using the agent ip and the relevant port
> 2.b) If the container has a routable ip (e.g., using calico or overlay),
> libprocess should still bind to 0.0.0.0, and advertise itself using the
> container ip and the relevant port. Currently, it binds to agent ip (which
> will fail), and advertise itself using agnet ip and the port in the
> container (which will fail as well)
> 2.c) If the container has a private ip (e.g., bridge), libprocess should
> still bind to 0.0.0.0, and advertise itself using the agent ip and _mapped_
> host port. Currently, it binds to agent ip (which will fail), and advertise
> itself using agent ip and the port in the container (which will fail as
> well)
>
> Therefore, the workaround
> <https://github.com/mesosphere/mesos/commit/b9c622b53b3ffcc27911fcdcefc37a
> 52ebe33bdd>
> suggested in MESOS-3740 <https://issues.apache.org/jira/browse/MESOS-3740>
> is not ideal. It does not consider 2.b) and 2.c)
>
> Libprocess now supports both LIBPROCESS_IP and LIBPROCESS_ADVERTISE_IP so
> the bind address does not have to be the address that is being advertised.
>
> For the 2.c) case, Mesos don't have a way to determine the advertise port
> (mapped port). This information is only known to the framework (which host
> port it'll use to serve as the mapped port for the libprocess).
>
> Given that, I think Mesos should not bindly set LIBPROCESS_IP to agent IP
> in executor environment variables. Framework should be the one that sets
> LIBPROCESS_ADVERTISE_IP and LIBPROCESS_ADVERTISE_PORT appropriately if it
> tries to launch another Mesos framework so that Master can reach the new
> framework. If the framework just wants to launch a regular container that
> does not depends on libprocess, it should simply not set these env
> variables.
>
> Also, I think libprocess should always bind to 0.0.0.0, rather than doing a
> hostname lookup and bind to the IP found for the hostname.
> LIBPROCESS_ADVERTISE_IP can be used to overwrite the ip address it wants to
> advertise to peers. If that's not specified, it'll try to do a hostname
> lookup to guess a routable ip.
>
> Thoughts?
> - Jie
>



-- 
Best Regards,
Haosdent Huang

Reply via email to