> > Do you proposal include this?
If we don't set LIBPROCESS_IP, by default, it'll bind to 0.0.0.0. - Jie On Sun, Oct 16, 2016 at 7:09 PM, haosdent <haosd...@gmail.com> wrote: > Sounds great! > > >libprocess should always bind to 0.0.0.0 > > Do you proposal include this? > > On Mon, Oct 17, 2016 at 2:12 AM, Jie Yu <yujie....@gmail.com> wrote: > > > OK, guys. Thanks for the input! Here is my proposal: > > > > 1) If the container uses host network, Mesos agent will set > > LIBPROCESS_ADVERTISE_IP > > to agent IP. This is for the case where DNS is not configured properly on > > the host (we don't need to do that if DNS is configured properly). By > doing > > this, libprocess will skip hostname lookup and advertise > > LIBPROCESS_ADVERTISE_IP > > directly. > > > > 2) If the container uses non-host network, and defines port mapping > (e.g., > > bridge). Mesos agent will not set any libprocess env variables. Given > that > > there could be multiple mapped ports, Mesos agent don't know how to > > set LIBPROCESS_ADVERTISE_PORT. > > So it's framework's responsibility to set LIBPROCESS_ADVERTISE_IP and > > LIBPROCESS_ADVERTISE_PORT > > properly in this case (through CommandInfo.environment) > > > > 3) If the container uses non-host network, and does not define port > mapping > > (e.g., ip per container). Mesos agent will not set any libprocess env > > variables. In this case, both CNI isolator and docker engine will > properly > > setup DNS in the container so hostname lookup should work properly. > > > > - Jie > > > > On Sat, Oct 15, 2016 at 4:01 PM, tommy xiao <xia...@gmail.com> wrote: > > > > > good point, +1 > > > > > > 2016-10-13 0:27 GMT+08:00 Jie Yu <yujie....@gmail.com>: > > > > > > > Stephan, > > > > > > > > I think the only time the framework needs to set > > LIBPROCESS_ADVERTISE_IP > > > is > > > > when DNAT is necessary for the container (e.g., bridge). In that > > > > case, LIBPROCESS_ADVERTISE_IP should always be agent ip and > > > > the relevant host port allocated for the container. For other cases, > > > > framework should not do anything. > > > > > > > > - Jie > > > > > > > > On Wed, Oct 12, 2016 at 4:43 AM, Erb, Stephan < > > > stephan....@blue-yonder.com > > > > > > > > > wrote: > > > > > > > > > >Framework should be the one that sets > > > > > >LIBPROCESS_ADVERTISE_IP and LIBPROCESS_ADVERTISE_PORT > appropriately > > if > > > > it > > > > > >tries to launch another Mesos framework so that Master can reach > the > > > new > > > > > >framework. > > > > > > > > > > As a framework/executor author this is not possible in all > scenarios: > > > > > There is no way to discover IP addresses assigned via CNI before > the > > > > first > > > > > StatusUpdate has been received. It is therefore not possible to set > > > > > LIBPROCESS_ADVERTISE_IP appropriately at launch time. > > > > > > > > > > Please see https://issues.apache.org/jira/browse/MESOS-6281 for > > > details. > > > > > > > > > > > > > > > On 12/10/16 06:42, "Avinash Sridharan" <avin...@mesosphere.io> > > wrote: > > > > > > > > > > Valid point. Makes sense to drive this decision from the user > and > > > the > > > > > framework. > > > > > > > > > > On Tue, Oct 11, 2016 at 9:32 PM, Jie Yu <yujie....@gmail.com> > > > wrote: > > > > > > > > > > > > > > > > > > > While I believe this particular logic of setting > > > > > LIBPROCESS_ADVERTISE_IP > > > > > > > to agent IP can be done in the agent (it could look at the > > port > > > > > mapping > > > > > > > as well) > > > > > > > > > > > > > > > > > > What if there are multiple port mappings? How can the agent > > > decide > > > > > which > > > > > > port to be used as LIBPROCESS_ADVERTISE_PORT? > > > > > > > > > > > > On Tue, Oct 11, 2016 at 9:27 PM, Avinash Sridharan < > > > > > avin...@mesosphere.io> > > > > > > wrote: > > > > > > > > > > > > > Definitely a +1 for executor binding to 0.0.0.0, instead of > > > > doing a > > > > > > > `gethostname` and `getaddrinfo`. But I am assuming this > > > semantics > > > > > would > > > > > > > kick in only if LIBPROCESS_IP is not set, which should be > the > > > > norm. > > > > > > > > > > > > > > +1 for LIBPROCESS_ADVERTISE_IP and > LIBPROCESS_ADVERTISE_PORT > > > and > > > > > the onus > > > > > > > being on the frameworks to set these variables. I guess the > > > > > framework can > > > > > > > set the LIBPROCESS_ADVERTISE_IP to the agent IP and > > > > > > > LIBPROCESS_ADVERTISE_PORT to the host port when it > specifies > > a > > > > > > > port-mapping. While I believe this particular logic of > > > > > > > setting LIBPROCESS_ADVERTISE_IP to agent IP can be done in > > the > > > > > agent (it > > > > > > > could look at the port mapping as well), when to actually > set > > > > these > > > > > > > variables (whether the executors even need to advertise > their > > > IP > > > > > > addresses, > > > > > > > is a decision that the Frameworks should be privy too and > not > > > > left > > > > > to the > > > > > > > agent. > > > > > > > > > > > > > > On Tue, Oct 11, 2016 at 7:31 PM, haosdent < > > haosd...@gmail.com> > > > > > wrote: > > > > > > > > > > > > > > > > libprocess should always bind to 0.0.0.0 > > > > > > > > + 1 for this > > > > > > > > > > > > > > > > On Wed, Oct 12, 2016 at 2:33 AM, Jie Yu < > > yujie....@gmail.com > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Hi folks, > > > > > > > > > > > > > > > > > > I was in the process of cleaning up some tech debt > > related > > > to > > > > > env > > > > > > > > variables > > > > > > > > > in our code base. I created an epic ticket > > > > > > > > > <https://issues.apache.org/jira/browse/MESOS-6341> to > > > > track. I > > > > > > > searched > > > > > > > > > relevant tickets fired previously, and found MESOS-3740 > > > > > > > > > <https://issues.apache.org/jira/browse/MESOS-3740>. I > > did > > > > some > > > > > > digging > > > > > > > > on > > > > > > > > > how we handle LIBPROCESS_IP currently, and here are my > > > > > findings: > > > > > > > > > > > > > > > > > > 1) We always set LIBPROCESS_IP in the executor > > environment > > > > > variables: > > > > > > > > > https://github.com/apache/mesos/blob/master/src/slave/ > > > > > > > > > slave.cpp#L6793-L6796 > > > > > > > > > > > > > > > > > > This is not an issue for an executor that runs on host > > > > network. > > > > > > > However, > > > > > > > > if > > > > > > > > > the executor wants to run on non-host network (e.g., > > > > overlay), > > > > > this > > > > > > > might > > > > > > > > > be problematic, because libprocess for the executor > will > > > try > > > > > to bind > > > > > > to > > > > > > > > > LIBPROCESS_IP, but the IP is not valid inside the > > > container. > > > > > > > > > > > > > > > > > > 2) As mentioned in MESOS-3740 > > > > > > > > > <https://issues.apache.org/jira/browse/MESOS-3740>, > some > > > > user > > > > > wants > > > > > > to > > > > > > > > run > > > > > > > > > a Mesos framework in a Mesos container. The old style > > > > framework > > > > > > driver > > > > > > > > > assumes a 2 way communication channel between the > > framework > > > > > and the > > > > > > > Mesos > > > > > > > > > master. In order for the master to reach the framework > > > > running > > > > > > inside a > > > > > > > > > Mesos container, the framework's libprocess should > > > advertise > > > > > its ip > > > > > > and > > > > > > > > > port properly. This problem gets tricky because the > > > > networking > > > > > for > > > > > > the > > > > > > > > > Mesos container: > > > > > > > > > > > > > > > > > > 2.a) If the container uses host network, libprocess > > should > > > > > bind to > > > > > > > > 0.0.0.0, > > > > > > > > > and advertise itself using the agent ip and the > relevant > > > port > > > > > > > > > 2.b) If the container has a routable ip (e.g., using > > calico > > > > or > > > > > > > overlay), > > > > > > > > > libprocess should still bind to 0.0.0.0, and advertise > > > itself > > > > > using > > > > > > the > > > > > > > > > container ip and the relevant port. Currently, it binds > > to > > > > > agent ip > > > > > > > > (which > > > > > > > > > will fail), and advertise itself using agnet ip and the > > > port > > > > > in the > > > > > > > > > container (which will fail as well) > > > > > > > > > 2.c) If the container has a private ip (e.g., bridge), > > > > > libprocess > > > > > > > should > > > > > > > > > still bind to 0.0.0.0, and advertise itself using the > > agent > > > > ip > > > > > and > > > > > > > > _mapped_ > > > > > > > > > host port. Currently, it binds to agent ip (which will > > > fail), > > > > > and > > > > > > > > advertise > > > > > > > > > itself using agent ip and the port in the container > > (which > > > > > will fail > > > > > > as > > > > > > > > > well) > > > > > > > > > > > > > > > > > > Therefore, the workaround > > > > > > > > > <https://github.com/mesosphere/mesos/commit/ > > > > > > > > b9c622b53b3ffcc27911fcdcefc37a > > > > > > > > > 52ebe33bdd> > > > > > > > > > suggested in MESOS-3740 <https://issues.apache.org/ > > > > > > > > jira/browse/MESOS-3740> > > > > > > > > > is not ideal. It does not consider 2.b) and 2.c) > > > > > > > > > > > > > > > > > > Libprocess now supports both LIBPROCESS_IP and > > > > > > LIBPROCESS_ADVERTISE_IP > > > > > > > so > > > > > > > > > the bind address does not have to be the address that > is > > > > being > > > > > > > > advertised. > > > > > > > > > > > > > > > > > > For the 2.c) case, Mesos don't have a way to determine > > the > > > > > advertise > > > > > > > port > > > > > > > > > (mapped port). This information is only known to the > > > > framework > > > > > (which > > > > > > > > host > > > > > > > > > port it'll use to serve as the mapped port for the > > > > libprocess). > > > > > > > > > > > > > > > > > > Given that, I think Mesos should not bindly set > > > LIBPROCESS_IP > > > > > to > > > > > > agent > > > > > > > IP > > > > > > > > > in executor environment variables. Framework should be > > the > > > > one > > > > > that > > > > > > > sets > > > > > > > > > LIBPROCESS_ADVERTISE_IP and LIBPROCESS_ADVERTISE_PORT > > > > > appropriately > > > > > > if > > > > > > > it > > > > > > > > > tries to launch another Mesos framework so that Master > > can > > > > > reach the > > > > > > > new > > > > > > > > > framework. If the framework just wants to launch a > > regular > > > > > container > > > > > > > that > > > > > > > > > does not depends on libprocess, it should simply not > set > > > > these > > > > > env > > > > > > > > > variables. > > > > > > > > > > > > > > > > > > Also, I think libprocess should always bind to 0.0.0.0, > > > > rather > > > > > than > > > > > > > > doing a > > > > > > > > > hostname lookup and bind to the IP found for the > > hostname. > > > > > > > > > LIBPROCESS_ADVERTISE_IP can be used to overwrite the ip > > > > > address it > > > > > > > wants > > > > > > > > to > > > > > > > > > advertise to peers. If that's not specified, it'll try > to > > > do > > > > a > > > > > > hostname > > > > > > > > > lookup to guess a routable ip. > > > > > > > > > > > > > > > > > > Thoughts? > > > > > > > > > - Jie > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > Best Regards, > > > > > > > > Haosdent Huang > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > Avinash Sridharan, Mesosphere > > > > > > > +1 (323) 702 5245 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Avinash Sridharan, Mesosphere > > > > > +1 (323) 702 5245 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > Deshi Xiao > > > Twitter: xds2000 > > > E-mail: xiaods(AT)gmail.com > > > > > > > > > -- > Best Regards, > Haosdent Huang >