[
https://issues.apache.org/jira/browse/MESOS-165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Charles Reiss reassigned MESOS-165:
-----------------------------------
Assignee: Charles Reiss
> Slaves die after initial registration with master with "Network is
> unreachable" error
> -------------------------------------------------------------------------------------
>
> Key: MESOS-165
> URL: https://issues.apache.org/jira/browse/MESOS-165
> Project: Mesos
> Issue Type: Bug
> Components: master, slave
> Environment: Scientific Linux 6.2 internal cluster
> Reporter: Jessica J
> Assignee: Charles Reiss
> Priority: Blocker
>
> I am using a cluster in which only the master is externally accessible, so
> when I start the master, I set --ip to one of its internal IP addresses so
> that it can communicate with its slaves. I have also tried setting this ip
> address in mesos-env.sh (in the deploy directory) by setting LIBPROCESS_IP,
> but each time the master starts, it says that it is running at the external
> IP address (as if it is ignoring the --ip or LIBPROCESS_IP options).
> When I start a slave, I tell it that the master is at an internal IP address
> (no matter what the master says it's running at), so the initial connection
> is successful. (I get messages output from both the slave and the master
> saying the connection was successful.) However, after registering, the slave
> *immediately* dies. My guess is that upon successful connection, the master
> tells the slave to communicate with it on the external IP address, but since
> the slave has no access to the Internet, any further communication fails.
> The following is the error message the slave gives when it dies:
> F0314 12:25:45.196940 13406 process.cpp:1576] Failed to link, connect:
> Network is unreachable [101]
> *** Check failure stack trace: ***
> @ 0x7f7d6be3342d google::LogMessage::Fail()
> @ 0x7f7d6be36ae7 google::LogMessage::SendToLog()
> @ 0x7f7d6be36066 google::LogMessage::Flush()
> @ 0x7f7d6be36279 google::LogMessage::~LogMessage()
> @ 0x7f7d6be39351 google::ErrnoLogMessage::~ErrnoLogMessage()
> @ 0x7f7d6be47319 process::SocketManager::link()
> @ 0x7f7d6be4bc88 process::ProcessManager::link()
> @ 0x7f7d6be4ed98 process::ProcessBase::link()
> @ 0x7f7d6bcaf575 mesos::internal::slave::Slave::newMasterDetected()
> @ 0x7f7d6bcbbd7f ProtobufProcess<>::handler1<>()
> @ 0x7f7d6bcbe477 ProtobufProcess<>::visit()
> @ 0x7f7d6be504e0 process::MessageEvent::visit()
> @ 0x7f7d6be4b448 process::ProcessManager::resume()
> @ 0x7f7d6be43bae process::schedule()
> @ 0x7f7d6b5a77f1 start_thread
> @ 0x7f7d6a93c92d clone
> Aborted
> I have looked at the code (master.cpp, process.cpp, main.cpp, slave.cpp,
> mesos-master.sh, etc.) and tried to determine why the ip option is getting
> ignored, but I have thus far been unsuccessful.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira