Carles Figuerola created MESOS-7628:
---------------------------------------
Summary: Changing from --ip to --advertise_ip makes the
mesos-slaves not take any new jobs
Key: MESOS-7628
URL: https://issues.apache.org/jira/browse/MESOS-7628
Project: Mesos
Issue Type: Bug
Affects Versions: 0.28.1
Environment: CentOS Linux release 7.2.1511 (Core)
Reporter: Carles Figuerola
We had been running an extensive environment with all the mesos agents using
the --ip flag so the masters could find them, as this makes it bind to only
that IP and calls to http://localhost:5051 wouldn't work, we found that
replacing it for --advertise_ip would make the agents findable by the masters
but the process would bind to 0.0.0.0 instead. Upon doing this in a live
environment, the masters won't schedule any tasks to the agents:
master log:
{code}
Jun 06 14:30:16 mesosmst002.us-west-2.lab.example.com mesos-master[869]: E0606
14:30:16.905573 918 process.cpp:1958] Failed to shutdown socket with fd 45:
Transport endpoint is not connected
Jun 06 14:32:24 mesosmst002.us-west-2.lab.example.com mesos-master[869]: E0606
14:32:24.137552 918 process.cpp:1958] Failed to shutdown socket with fd 29:
Transport endpoint is not connected
Jun 06 14:32:41 mesosmst002.us-west-2.lab.example.com mesos-master[869]: E0606
14:32:41.033612 918 process.cpp:1958] Failed to shutdown socket with fd 45:
Transport endpoint is not connected
{code}
agent logs:
{code}
Jun 06 14:32:37 ip-10-24-XX-XX.us-west-2.lab.example.com mesos-slave[26507]:
E0606 14:32:37.103865 26516 process.cpp:1958] Failed to shutdown socket with fd
24: Transport endpoint is not connected
Jun 06 14:32:37 ip-10-24-XX-XX.us-west-2.lab.example.com mesos-slave[26507]:
E0606 14:32:37.103961 26516 process.cpp:1958] Failed to shutdown socket with fd
23: Transport endpoint is not connected
Jun 06 14:32:37 ip-10-24-XX-XX.us-west-2.lab.example.com mesos-slave[26507]:
E0606 14:32:37.104019 26516 process.cpp:1958] Failed to shutdown socket with fd
21: Transport endpoint is not connected
Jun 06 14:32:37 ip-10-24-XX-XX.us-west-2.lab.example.com mesos-slave[26507]:
E0606 14:32:37.104082 26516 process.cpp:1958] Failed to shutdown socket with fd
15: Transport endpoint is not connected
Jun 06 14:34:47 ip-10-24-XX-XX.us-west-2.lab.example.com mesos-slave[26507]:
E0606 14:34:47.151888 26516 process.cpp:1958] Failed to shutdown socket with fd
24: Transport endpoint is not connected
Jun 06 14:34:47 ip-10-24-XX-XX.us-west-2.lab.example.com mesos-slave[26507]:
E0606 14:34:47.152065 26516 process.cpp:1958] Failed to shutdown socket with fd
23: Transport endpoint is not connected
Jun 06 14:34:47 ip-10-24-XX-XX.us-west-2.lab.example.com mesos-slave[26507]:
E0606 14:34:47.152196 26516 process.cpp:1958] Failed to shutdown socket with fd
21: Transport endpoint is not connected
Jun 06 14:34:47 ip-10-24-XX-XX.us-west-2.lab.example.com mesos-slave[26507]:
E0606 14:34:47.152262 26516 process.cpp:1958] Failed to shutdown socket with fd
15: Transport endpoint is not connected
{code}
When testing this in another region on a new cluster with this flag enabled,
the tasks get scheduled and the system works as expected.
Any help is appreciated, thanks
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)