[ 
https://issues.apache.org/jira/browse/MESOS-7628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16087526#comment-16087526
 ] 

Carles Figuerola commented on MESOS-7628:
-----------------------------------------

Sorry if I wasn't specific enough on my usage of both parameters. This set up 
is on an environment where the masters and agents can't find each other by 
hostname, only by IP.

Initial state:
* --ip = <IP>
* --advertise_ip = parameter is not there
* --hostname = <IP>

This setup above works as expected, the agents register on the mesos master 
with the IP being their master and tasks are scheduled on them. Then we switch 
to this:
* --ip = parameter is not there
* --advertise_ip = <IP>
* --hostname = <IP>

In this case, we don't use the {{--ip}} flag but the mesos master should still 
be able to communicate with the agent as the {{--advertise_ip}} is the one 
where the agent can be found. We prefer to use that so the agent binds on 
0.0.0.0 instead of the public IP of the server.

Thanks

> Changing from --ip to --advertise_ip makes the mesos-slaves not take any new 
> jobs
> ---------------------------------------------------------------------------------
>
>                 Key: MESOS-7628
>                 URL: https://issues.apache.org/jira/browse/MESOS-7628
>             Project: Mesos
>          Issue Type: Bug
>    Affects Versions: 0.28.1
>         Environment: CentOS Linux release 7.2.1511 (Core) 
>            Reporter: Carles Figuerola
>
> We had been running an extensive environment with all the mesos agents using 
> the --ip flag so the masters could find them, as this makes it bind to only 
> that IP and calls to http://localhost:5051 wouldn't work, we found that 
> replacing it for --advertise_ip would make the agents findable by the masters 
> but the process would bind to 0.0.0.0 instead. Upon doing this in a live 
> environment, the masters won't schedule any tasks to the agents:
> master log:
> {code}
> Jun 06 14:30:16 mesosmst002.us-west-2.lab.example.com mesos-master[869]: 
> E0606 14:30:16.905573   918 process.cpp:1958] Failed to shutdown socket with 
> fd 45: Transport endpoint is not connected
> Jun 06 14:32:24 mesosmst002.us-west-2.lab.example.com mesos-master[869]: 
> E0606 14:32:24.137552   918 process.cpp:1958] Failed to shutdown socket with 
> fd 29: Transport endpoint is not connected
> Jun 06 14:32:41 mesosmst002.us-west-2.lab.example.com mesos-master[869]: 
> E0606 14:32:41.033612   918 process.cpp:1958] Failed to shutdown socket with 
> fd 45: Transport endpoint is not connected
> {code}
> agent logs:
> {code}
> Jun 06 14:32:37 ip-10-24-XX-XX.us-west-2.lab.example.com mesos-slave[26507]: 
> E0606 14:32:37.103865 26516 process.cpp:1958] Failed to shutdown socket with 
> fd 24: Transport endpoint is not connected
> Jun 06 14:32:37 ip-10-24-XX-XX.us-west-2.lab.example.com mesos-slave[26507]: 
> E0606 14:32:37.103961 26516 process.cpp:1958] Failed to shutdown socket with 
> fd 23: Transport endpoint is not connected
> Jun 06 14:32:37 ip-10-24-XX-XX.us-west-2.lab.example.com mesos-slave[26507]: 
> E0606 14:32:37.104019 26516 process.cpp:1958] Failed to shutdown socket with 
> fd 21: Transport endpoint is not connected
> Jun 06 14:32:37 ip-10-24-XX-XX.us-west-2.lab.example.com mesos-slave[26507]: 
> E0606 14:32:37.104082 26516 process.cpp:1958] Failed to shutdown socket with 
> fd 15: Transport endpoint is not connected
> Jun 06 14:34:47 ip-10-24-XX-XX.us-west-2.lab.example.com mesos-slave[26507]: 
> E0606 14:34:47.151888 26516 process.cpp:1958] Failed to shutdown socket with 
> fd 24: Transport endpoint is not connected
> Jun 06 14:34:47 ip-10-24-XX-XX.us-west-2.lab.example.com mesos-slave[26507]: 
> E0606 14:34:47.152065 26516 process.cpp:1958] Failed to shutdown socket with 
> fd 23: Transport endpoint is not connected
> Jun 06 14:34:47 ip-10-24-XX-XX.us-west-2.lab.example.com mesos-slave[26507]: 
> E0606 14:34:47.152196 26516 process.cpp:1958] Failed to shutdown socket with 
> fd 21: Transport endpoint is not connected
> Jun 06 14:34:47 ip-10-24-XX-XX.us-west-2.lab.example.com mesos-slave[26507]: 
> E0606 14:34:47.152262 26516 process.cpp:1958] Failed to shutdown socket with 
> fd 15: Transport endpoint is not connected
> {code}
> When testing this in another region on a new cluster with this flag enabled, 
> the tasks get scheduled and the system works as expected.
> Any help is appreciated, thanks



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to